Life in the Cloud – George Watt
“#CloudViews” Cloud Outage Chat Participants Put Their Customers First
Last Thursday I participated as a panelist in Cloud Commons‘ “#CloudViews” Twitter chat (partial session archive here or page through the full archive here). The following is a brief summary of that event.
Put the Customers’ Interests First
Though the topic of this chat session was “Cloud Outages” there was, I believe, another clear theme: It’s all about the consumer. It’s all about the customer. And the participants care about the well-being of the businesses to which they provide service. Whilst this was demonstrated in a somewhat subtle way in numerous posts, some of them were quite straight forward.
Transparency is Paramount
Closely connected to the underlying theme of respect for our customers was a very active discussion regarding transparency of providers when service is disrupted. Participants weighed in from both customer and provider perspectives. For example, this excerpt from an exchange started by Jonathan Davis of DNS Europe who offered his opinion on the service provider perspective:
Jay Fry’s comment resulted in much agreement and was widely reposted. Christoph Streit of ScaleUp agreed:
And this exchange from the customer’s perspective generated much agreement, including from the service provider community in attendance, as can be seen through responses from Jonathan and from Mimecast’s Justin Pirie.
(This topic produced much conversation. Posts were too numerous to include all of them. I apologize to those whom I omitted. )
So, it was incredibly encouraging to see so much agreement on the importance of best practices, customer focus, and ethical conduct.
Built-in and Built-On Resilience
Yes, there was also discussion of service outages and resilience – and a lot of it. There were many good perspectives on how providers, application architects, and consumers can deliver resilience. I believe there was also nearly unanimous agreement that components can and will fail, and that services must be architected to address that. (Please visit the chat archive for other examples.)
I have attempted to extract a representative sampling of key points made throughout the discussion and share it via the list below. Before I share that I would like to answer a question asked by my colleague, Andi Mann, during the session that I missed as posts flew past. (Apologies for not catching that, Andi.) In response to one of my posts that stated resilience can be “built-in” to the cloud platform or “built-on” via the application or service Andi asked:
When I referred to “built-in” resilience I was referring to the things that the service providers have added to their services in order to ensure that their customers experience no loss of service when a component fails. The providers who joined the session discussed many of these things such as N+1 environments, clustering, and geographically disbursed data centers.
As we have witnessed recently, even when such precautions are taken a service can suffer an outage. There are many reasons this can happen ranging from a new type of issue surfacing for which the provider was not prepared, to cases where, through no fault of the provider (their service remains active) the customer (composite application…) is unable to connect to the service. In order to address this, and to ensure that services are not disrupted even in these cases (to make sure nobody notices) application architects are building cloud-savvy resilience into their solutions (into the application). This is what I referred to as “built-on“, since it sits “on top of” any resilience “built-in” by the service providers, and since it adds a/another layer of protection. Netflix’ “Rambo Architecture” and its use of “Chaos Monkeys is a good example of this.
The tweet chat panelists shared and discussed many great tips and lessons learned. While approaches to specific issues were different at times, generally there was broad agreement in many areas. Participants tended to agree on the following:
In addition to these items, several tips were shared such as this one:
I quite enjoyed the session and was very pleased with the level of active participation, with the great information that was shared, and with the level of respect the participants offered one-another, even when their views were different. So I would like to offer a sincere thank you to the chat participants. If I missed something important please do let me know.
To all who were kind enough to read this: What other words of wisdom would you offer regarding cloud outages? We would also greatly appreciate suggestions for topics for future chat sessions.