Many organisations look to the cloud to provide some level of contingency against their own systems failing, be it through the use of off data backup, failover servers for business applications or the use of high-availability servers and software. Doing so provides a level of disaster recovery (DR) and business continuity (BC), the level of which can vary according to a given organisation’s risk appetite and budget. In this context, disaster recovery looks at how rapidly an organisation can get back a level of capability after the failure of a system, whereas business continuity covers how well an organisation can keep working through such a failure.
The degree to which cloud services are suitable for providing a safety blanket will vary from one case to another. So which one is right for your organisation?
The following use case scenarios provide some guidance, starting with the most basic level of data backup and moving to full business continuity:
- Simple data backup – the cloud can act as an external system where files can be stored as duplicates so that if there is a problem with on-premise storage, individual files can be recovered, or images of specific machines can be restored as required. This can be very cost effective – but as with similar on-premise solutions, there will be a level of down-time while the data is identified and restored to the live environment. Also, large amounts of data will take a long time to be recovered over the internet – which is why Quocirca recommends that data be recovered from the cloud to a local physical device which is then couriered to the customer’s site and then recovered to the target storage system at local area network (LAN) speeds. Furthermore, a cloud service provider may be able to offer additional archiving services that could work well for compliance needs (as Quocirca points out in a previous blog post – http://blog.lunacloud.com/compliance-in-the-cloud/)
- Secondary data storage. The cloud can be used to mirror existing data. When there is a failure of an on-premise data storage device, systems can failover to the cloud-based mirror. Although this may seem to provide a good level of business continuity, organisations must bear in mind that providing data to on-premise applications from outside the data centre may lead to latency issues, and that the synchronisation of live data may not be as easy as first thought.
- Primary data storage – no data is stored on-premise, instead being held directly in the cloud. However, the application remains on-premise. Although this should provide better data availability due to how the cloud provider architects its storage platform, the latency from the on-premise application to the data will generally make this a non-viable option. However, data backup and restore is now being carried out at LAN speed.
- Both applications and data are held in the cloud, with data back-up and restore being integrated. This moves the application and data closer to each other so that latency between the two is no longer an issue. As long as the application supports web-based access effectively, the user experience should be good. Should the prime data storage be impacted, restores can be carried out at LAN speed so recovery time objective (RTO) is shortened. However, this only provides data continuity – if the application goes down, the organisation will still be unable to carry out its business.
- Applications run as virtual machines with data being mirrored. This is getting closer to real business continuity. By using applications that have been packaged as virtual machine, the failure of a single instance of the application can be rapidly fixed through just spinning up a new instance. Data needs to be covered as well, and should be mirrored to a different storage environment to provide high data availability. Such an approach can lead to recovery times measured in a few minutes which will be enough for many organisations. This is also known as a “cold standby”, as standby virtual machines are only fired up when there is a problem.
- Stand-by business continuity. Here, the stand-by application virtual machine is permanently “spinning” (i.e. provisioned), but is not part of the live environment. On the failure of the live image, pointers can be moved over to the stand-by image in a matter of seconds, using existing or mirrored data storage. Also known as “hot standby”, as the virtual machines are ready to take over as soon as a failure occurs.
- Full business continuity. Here, everything is provisioned to at least an “N+1” level. Multiple data storage silos are mirrored on a live basis and multiple live application virtual machines are maintained. Workloads are balanced between the virtual machines, and two-level commit is used on data to ensure that any problem with the data itself is not mirrored across all the data stores at the same time. This is the approach used by large organisations that have to have the capability to continue working through a systems failure – but is unaffordable for the majority of other organisations. However, cloud services can bring such capabilities to many more organisations through the economies of scale.
Obviously, there are cost issues as the amount of cover increases through the above list This is why any organisation must first understand its corporate risk profile, building up a picture of exactly what business risks it cannot afford to carry and that which it is capable of carrying. Once a risk profile has been created, the right level of technical “insurance” can be found from a cloud or hosting provider. The cloud makes the costs less of an issue, as each level can be offset through the number of organisations that are sharing the infrastructure. Therefore, an organisation that has previously regarded business continuity as out of its reach and has settled for disaster recovery can now look to the cloud to create a more supportive platform for their business.