Major Azure outage causing clients to kill me and me want to cry and drink heavily!
All back online now, but I might be having a few of those soon.
The real shame is how expensive it can be to keep other places (other data centers or cloud companies) ready with a âhot swapâ type setup
Yes. I have a server (virtual) that is really just sitting around in another data center, doing nothing, costing money, waiting for an outage. Unfortunately todayâs problem effected every data center, so main and secondary server were both dead!
Azure didnât have an outage, you did: http://jamie.ideasasylum.com/2014/11/their-problems-are-your-problems/
Youâve got to own those problems, whether they âyoursâ or a third partiesâ
Right, but this is why we hate the SaaS
I understand your pain, but outages happen everywhere.
If you were not using AWS/Azure/OpenStack you would still have your server on a datacenter, connected to a network with network gear that you donât manage. Also, to make the traffic to/from your server you traverse ISPâs routers that you donât manage, and so onâŚ
If I can take the suffering from managing some infrastructure away Iâll be happy, although it is really painful when it is down and you canât do anything about it but wait.
Hope it makes sense
You are right, but some times problems are so hard and expensive to avoid it is frustrating and impractical. Having failover in two data centers of the one platform(Azure) is not enough, you also need failover in totally separate platforms, one in Azure and one in AWS. This is actually not too hard for the web front end, but once you start trying to failover databases from one to the other it gets a whole heap harder.
I think we need to put this all in perspective. Donât get me wrong, it sucks. But you donât ever hear anyone proclaiming that they hate electricity because the power went out.
You are right, I donât really hate the SaaS and think going forward it is the only way most applications will be delivered. I actually love the SaaS. Only reason I bought it up is because it is always talked about on the podcast.
For 99.99% of web apps, thatâs overkill. I didnât say that you canât have an outage, I just said that you need to âownâ it.
I hate the electricity. Hahaha.
Thatâs a good analogy, but it doesnât exactly hold up. Electricity is very reliable for us and not that many bad things happen in the rare cases when it goes out.
Maybe a better analogy would be if youâre in a country that has pretty frequent and brutal power outages - would it be better to build a business that relies on that electricity and has a higher profit margin or better recurring revenue vs. a business that somehow doesnât rely on that electricity (relies on generators maybe and operates at lower energy levels) but has lower profit margin or recurring revenue.
I agree, but the analogy still serves a point. SaaS isnât the problem, just like electricity isnât the problem. The problems are current expectations, supporting infrastructure etc. In other words, SaaS is a great idea/concept but it needs to be refined. I believe it is still very early in the product lifecycle.
I donât see many SaaS products mentioning a SLA or including one in their terms and conditions. Theyâre really common in the hosting industry, and since running a SaaS is a mix of being a product company and a host, imo you should operate as such - give your customers a clear SLA as part of their agreement, and train them to expect and accept that these things happen for the price theyâre willing to pay. That way downtimeâs a much more manageable problem - keep customers informed and happy, refund in accordance with your agreement if you breach your SLA, and everyone knows where they stand.
100% uptime is complicated and expensive, and realistically most customers in most markets can cope under a 99.9% SLA. Of course, weâre only talking occasional worst-case; if itâs really 45 minutes every month, then you probably need to find a new provider. If you have a customer or product where that could be a genuine problem, add an enterprise tier with a higher availability SLA, and put the extra money towards upgrading your infrastructure.
I understand the frustration. You want to do the best possible job for your customer and youâve made to look bad by a supplier. But the software I use crashes frequently, websites go down, my Internet connection drops. Iâm used to it and I think most other people are too. Good 'ol Microsoft has trained consumers and businesses to expect crashes. ;0)
As long as your software isnât safety critical and you havenât corrupted or lost lots of data, I wouldnât sweat the occasional bit of downtime.
SLAâs really serve a marketing purpose rather than anything else. An effective SLA needs three things:
- clear definitions (e.g. what âuptimeâ, does it include unexpected only or also planned maintenance, over what time period, monthly? rolling 4-week period? etc);
- an external system than can monitor this SLA in the terms youâve defined and both parties can monitor;
- penalties which ensure compliance and fairly compensate customers for breaches of the agreement.
The first is relatively easy but time consuming and itâs the sort of thing that enterprise contract lawyers live for
The second is usually prohibitive for most companies (Iâve built these types of SLA monitoring systems for large telcos but almost no one else can justify them)
Lastly, the real problem: if your e-commerce store generates $1000/day and the site is down for an hour, should I assume that youâve lost 1/24th of $1000 ($41), or should I give you 1/24*1/30 of your monthly $50 fee (70c)? Either way, the incentives are generally not aligned and most smart providers will never accept the liability based on your actual business value (âFor 1/2 day, my 100 employees couldnât do X. Thatâs ~$7k in salary costs that I want backâ)
SLAâs are great for marketing, and theyâre an important comforting fact for enterprise sales, but theyâre basically worthless to the customer.