Yesterday in Atlanta I had the honor of moderating a Technology Association of Georgia (TAG) panel discussion on Operational Residence and the topic of cloud computing came up. While most people working for cloud providers (I used to work at one) will tell you that Disaster Recovery is a great use case for cloud, our panelists weren't so sure. The feeling in the room is that utilizing cloud environment in addition to traditional on premise environments created a bunch of operational complexity and it was safer to keep both production and DR in-house.
So which is it? Cloud providers are clearly making money selling DR services, but managing hybrid on-premise, cloud DR is difficult and challenging and you may not get the results you expect.
Are IT managers signing for cloud based DR services just to check a box, not knowing if it'll work when it's needed? Perhaps....
The reality is that when done correctly, cloud based DR services can help companies protect their operations and mitigate risk - but it's not easy.
For anyone tasked with developing an IT disaster recovery plan as part of their company’s business continuity plan, the alphabet soup of DR options talked about today by service providers, software vendors, analysts and pundits is truly bewildering. Against this backdrop, analysts like Gartner are predicting dramatic growth in both the consumption and hype of “cloudwashed” DR services.
With the lack of standardization, it’s increasingly complex to map DR business requirements to business processes, service requirements and technology. Given this, how do you make sense of it all? For one thing, it’s critically important to separate the information you need from the noise, and the best way to do that is:
Start with the basics of what you’re trying to do – protect your business by protecting critical IT operations, and utilize new technologies only where they make sense. Here are some things to think about as you consider DR in the context of modern, “cloudy” IT.
There is no such thing as DR to the cloud (even though cloud providers claim "DR to the Cloud" solutions).
There’s been a lot made lately about utilizing cloud technology to improve the cost effectiveness of Disaster Recovery solutions. Vendors, analysts, and others use terms like DRaaS, RaaS, DR-to-the-Cloud, etc. to describe various solutions. I’m talking about using cloud as a DR target for traditional environments, not Cloud to Cloud DR (that’s a whole other discussion).
There’s one simple question underlying all this though: If, when there is a disaster, these various protected workloads can run in the cloud, WHY AREN’T they there already?
Getting an application up and running in on a cloud is probably more difficult in a DR situation than if there isn’t a disaster. If security, governance, and compliance don’t restrict those applications from running in a cloud during a DR event, they should be considered for running in the cloud today. There are lots of other reasons for not running things in the cloud, but it’s something to consider.
You own your DR plan. Period.
Various software and services out there provide service level agreements for recovery time and recovery point objectives, but that doesn’t mean that if you buy it, you have DR. For example, what exactly does the word recovery mean? Does it mean that a virtual machine is powered up or that your customers can successfully access your customer support portal? The point is, except in the case of 100% outsourced IT, only your IT department can oversee that the end to end customer (or employee in the case on internal systems) processes will be protected in the case of disaster. There are lots of folks that can help with BIAs, BCDR planning, hosting, etc. that provide key parts of a DR solution, but at the end of the day, ultimate responsibility for DR lies with the IT department.
Everyone wants DR, but no one wants to pay for it.
I’ve had lots of conversations with and inquiries from customers asking for really aggressive DR service levels, and then when they hear about how much it’s going to cost, they back away from their initial requirements pretty quickly. The reality is that as objectives get more aggressive, the cost of DR infrastructure, software, and labor begins to approach the cost of production – and few businesses are able to support that kind of cost. Careful use of techniques like using test/dev environments for DR, global load balancing of active/active workloads, less aggressive recovery time objectives can drive the cost of DR down to where it should be (about 25% of your production environments’ cost), but be skeptical of any solutions that promise both low cost and minimum downtime.
Service provider’s SLA penalties never match the true cost of downtime.
Ok, let’s be honest - Unless you’re running an e-commerce site and you can measure the cost of downtime, you probably don’t know the true cost of downtime. Maybe you hired an expensive consultant and he or she told you the cost, but that’s based on an analysis with outputs highly sensitive to the inputs (and those inputs are highly subjective).
But that doesn’t mean that service provider SLA penalties don’t matter. Actually, strike that – service provider penalties don’t matter.
A month of services or some other limited penalty in the event of missing a DR SLA won’t compensate for the additional downtime. If it did, then why are you paying for that stringent SLA in the first place? The point here is that only a well thought out and tested DR strategy will protect your business. This leads me to my last point.
You don’t have DR if you don’t regularly test.
A DR solution is not “fire and forget”. To insure that your DR solution works, I recommend that you test at the user level at least quarterly. DR testing is also a significant part of the overall cost of DR and should be considered when building your business case. I’m sad to say, many of my customers do not test their DR solutions regularly (or at all). The reasons for this are many, but in my opinion, it’s usually because the business processes and metrics were never implemented by an initiative driven exclusively by IT technologists. My advice if you implement a DR solution and don’t test it: Keep your resume up to date, you’ll need it in the event of a “disaster”.