Resiliency is not a byproduct of cloud computing
About a week ago while thinking of cloud services I realized that I had been having a recurring thought for several weeks: “If a cloud falls in the forest, would anybody hear it?” I resisted several urges to tweet those words thinking it might not really make sense to others and that it might not resonate. I’ll never know now – what a difference a week can make.
The buzz following a service disruption at an undisputed leading cloud provider might give one the impression that there was widespread belief that resilience is always a byproduct of cloud services. If you believe that to be an overstatement perhaps you might at least agree that many consumers of cloud services assumed, and still assume, that resilience is delivered as part of any cloud service. Worse, the assumption may be that resilience is part of every cloud service.
In fact, much diligence is required in this regard when selecting cloud services. For example, do you factor into your cloud service selection that:
Please Stay Tuned
Now, if you have been considering “cloudy options” to address your business requirements and opportunities please do not let these items, or anything else you may have recently read about service disruptions, give you the impression that no cloud solutions can offer the resilience you require. In fact, some cloud solutions may be far more resilient than solutions you might be able to deliver on your own, in your own building. Solutions are not any more or less resilient than on-premise solutions simply because they are cloud solutions.
Where to Begin?
As with on-premise solutions, some cloud solutions and services are better than others. We simply need to be diligent with key items. These five items might be a good place to begin:
Read and negotiate contracts carefully, and do not assume that because your resilience requirements are met in one contract they will be satisfied in another contract from the same vendor. Vendor contract terms and conditions can change frequently, and some contracts even enable vendors to change those at their discretion and put the onus on their customers to remain up to date (e.g.: by regularly visiting their website). As well, your business needs and/or objectives may have changed, and your business may be subject to changing compliance regulations. In addition these services often come with an additional charge.
Not every system requires the same level of resilience so be prepared to make informed choices regarding which level is acquired for each service. What’s important is that these decisions are made consciously and with considered intent; and that key stakeholders are involved and informed.
2) SLAs are paramount:
Ensure you understand the service provider’s detailed obligations in the event of a disruption. How quickly must they return you to service? What are their obligations to your business if they cannot meet their SLAs? What are the remedies (e.g.: will they reimburse you for lost service or business)? Are they obligated to keep your service up and running? What if they cannot recover, or cannot recover quickly enough for your business? Are they obligated to return your data to you? If they are obligated to return the data, how much time do they have to get it to you? In what format must the data be?
3) Drive defensively – Think beyond the contract:
Should something bad happen, what if your cloud provider is not capable of providing the level of service and resilience that is specified in the contract? What if the service provider is sufficiently damaged by an outage that they can no longer carry on their business? The fact that you have the legal high ground may not keep your business from failing. My high school Driver’s Education instructor, Mr. Baron, used to say that the fact you had the right of way will be little consolation to you as you look up at the muffler of a car you walked in front of. So, consider your prospective provider’s track record and reputation. Do they, or their key personnel, have a track record of putting their customers’ interests first and of performing Herculean tasks to ensure the success of their customers? Better still, do they have a reputation of being resilient themselves?
4) Business continuity is everyone’s business:
Of course we want, and expect, our cloud providers to deliver resilient services. They need to be concerned with details that might span from failure of a device or software component to failure of a rack, or a row, or an entire facility. It doesn’t end there. It’s your company’s logo that’s on the website or service agreement your customers see; and who was responsible for a disruption likely won’t matter to them when they are deciding whether or not they should continue to do business with you. Business continuity planning remains a key requirement even with cloud solutions and it remains your responsibility to ensure that solid plans are in place. Consideration must also be given to other items such as how the failure of an on-premise system might impact cloud-based services and vice versa.
5) Disruption comes in many flavors:
While recent discussions have been largely focused on what might be referred to as a catastrophic or complete service disruption, “smaller” failures such as performance degradation can hurt just as badly. For example, what if an online shopping site’s response time was degraded from sub-second to five minutes per click for a forty-eight hour period? The affect – two days of lost sales – would likely be quite similar to a complete outage. In fact, it is possible that sustained unacceptable response time might even be more frustrating to customers. These things are just as important to the health of a business, and they become even more critical in the context of a composite application. How does your application or service respond when one component of that composite application fails or is too slow?
These are just a few of the things that come to mind; and they’re not unique to cloud solutions. The question we should be asking is not “Will things fail?” but rather “When they fail will anyone notice?” and “Can they recover that seamlessly?” In fact, cloud solutions may offer opportunities to provide resilience far beyond what can be provided on-premise (at times at lower cost) through the use of pooled capacity, globally available resources, and the ability to leverage multi-purpose capacity. There may even be opportunities to leverage cloud capacity in the event of a failure that impacts on-premise systems and services. We worked with our Business Continuity team to develop strategies for the use of engineering cloud capacity as “production” capacity in the event of a substantial failure.
So, in consideration of recent events should we all drop, or at least postpone, our plans to consume cloud services? Well, the things that must be considered in order to provide a business service that includes cloud-based components are not terribly different from those we have always needed to consider. And we continue to read about and hear of all of the great benefits and opportunities cloud computing can bring to a business. (Dan Kusnetzky’s article provides a nice summary of some of those.) And with that in mind my answer is “of course not.” Cloud-based solutions offer compelling business advantage. We simply need to be as diligent with certain aspects of our cloud-based solutions as we have been with our on-premise solutions. In fact it may often be the case that the effort required will be much lighter on the part of the consumer in the sense that they may need to be aware of what their providers are doing but will be required to do a lot less of the heavy lifting themselves. (“Trust, but verify.”)
So, when our business will benefit, we should confidently go forth and experience the benefits of cloud computing. What would you add to this list?
You can follow me on Twitter: @GeorgeDWatt