“We cannot prevent a disaster, but knowledge can help us arm ourselves with knowledge.” – Petra Nemcova
Any IT disaster, such as network disruptions, technical failures, or unauthorised human misconfigurations, can have a devastating impact on business. There are several Disaster Recovery (DR), strategies that can be used depending on the type of disaster. A multi-AZ strategy can be used to mitigate a disaster such as flooding at a data center. However, a disaster such as an attack on production data would require a data backup strategy that fails to backup in another AWS region.
My previous blog, Data backup and Pilot Light, discussed DR strategies like loss of data or corruption within a single Availability Zone and replication to a passive area that is provisioned as needed. These scenarios are where only a portion of the application is active in the failover zone and it is quickly provisioned to full-scale production environments.
AWS offers four types of Disaster Recovery options. The benefits of RTO and RPO can be used to help you choose the right strategy.
Warm Standby DR Strategy
Warm Standby DR Strategy, an extension to Pilot Light, is a mode that allows for a fully functional, scaled down copy of the production environment in another area to be in standby mode. It can be confusing to distinguish between Pilot Light and Warm Standby, as both include a production environment that runs in the DR Region with copies the primary region assets. The RTO and RPO requirements will help you choose the right DR.
This allows us to perform continuous testing and test easily to increase our confidence in our ability recover from an unplanned disaster.
Warm Standby Failover Mechanism
Warm Standby is a older brother to Pilot Light. It includes all functionality needed for the system in another region. In Pilot Light, only core services are available and ready for recovery. However, Warm Standby has everything running at a minimal level. This means that the load balancer and databases, gateways, and subnets can be activated at a moment’s notice. Warm Standby’s RTO/ RPO is within minutes, which means that recovery time is almost immediate. In case of a production system failure, the standby infrastructure will be scaled up to match the production environment. DNS records are also updated to route traffic to the AWS environment that was provisioned minutes ago.
Auto Scaling allows you to scale the DR Region to full production capacity. The settings can be manually adjusted via AWS Management Console. This can be done via AWS SDK, or by redeploying CloudFormation templates with the new capacity value.
Scaling AWS EC2
AWS Management Console
Amazon Machine Images (AMIs).
Snapshots of Amazon EBS
Amazon DynamoDB Backup
Redshift, Neptune and Aurora DB snapshots
It can be difficult to understand the Pilot Light and Warm Standby Strategy. You need to take into account the RTO and RPO metrics, as they can vary for different scenarios. Warm Standby is often used for monitoring and testing purposes. A well-designed plan, based on your disaster recovery needs, will ensure minimal business disruption and no data loss.
CloudThat offers an end-to-end implementation for Disaster Recovery strategies to protect your infrastructure solutions from data losses and create a cost-effective, flexible DR program that suits your business. Keep watching for more information about Warm-Standby or Multi-site active/active DR strategies. Learn more about CloudThat’s Consulting and Expert Advisory here (https://www.cloudthat