Major Azure outage causing clients to kill me and me want to cry and drink heavily!
All back online now, but I might be having a few of those soon.
The real shame is how expensive it can be to keep other places (other data centers or cloud companies) ready with a “hot swap” type setup
Yes. I have a server (virtual) that is really just sitting around in another data center, doing nothing, costing money, waiting for an outage. Unfortunately today’s problem effected every data center, so main and secondary server were both dead!
Azure didn’t have an outage, you did: http://jamie.ideasasylum.com/2014/11/their-problems-are-your-problems/
You’ve got to own those problems, whether they “yours” or a third parties’
Right, but this is why we hate the SaaS
I understand your pain, but outages happen everywhere.
If you were not using AWS/Azure/OpenStack you would still have your server on a datacenter, connected to a network with network gear that you don’t manage. Also, to make the traffic to/from your server you traverse ISP’s routers that you don’t manage, and so on…
If I can take the suffering from managing some infrastructure away I’ll be happy, although it is really painful when it is down and you can’t do anything about it but wait.
Hope it makes sense
You are right, but some times problems are so hard and expensive to avoid it is frustrating and impractical. Having failover in two data centers of the one platform(Azure) is not enough, you also need failover in totally separate platforms, one in Azure and one in AWS. This is actually not too hard for the web front end, but once you start trying to failover databases from one to the other it gets a whole heap harder.
I think we need to put this all in perspective. Don’t get me wrong, it sucks. But you don’t ever hear anyone proclaiming that they hate electricity because the power went out.
You are right, I don’t really hate the SaaS and think going forward it is the only way most applications will be delivered. I actually love the SaaS. Only reason I bought it up is because it is always talked about on the podcast.
For 99.99% of web apps, that’s overkill. I didn’t say that you can’t have an outage, I just said that you need to “own” it.
I hate the electricity. Hahaha.
That’s a good analogy, but it doesn’t exactly hold up. Electricity is very reliable for us and not that many bad things happen in the rare cases when it goes out.
Maybe a better analogy would be if you’re in a country that has pretty frequent and brutal power outages - would it be better to build a business that relies on that electricity and has a higher profit margin or better recurring revenue vs. a business that somehow doesn’t rely on that electricity (relies on generators maybe and operates at lower energy levels) but has lower profit margin or recurring revenue.
I agree, but the analogy still serves a point. SaaS isn’t the problem, just like electricity isn’t the problem. The problems are current expectations, supporting infrastructure etc. In other words, SaaS is a great idea/concept but it needs to be refined. I believe it is still very early in the product lifecycle.
I don’t see many SaaS products mentioning a SLA or including one in their terms and conditions. They’re really common in the hosting industry, and since running a SaaS is a mix of being a product company and a host, imo you should operate as such - give your customers a clear SLA as part of their agreement, and train them to expect and accept that these things happen for the price they’re willing to pay. That way downtime’s a much more manageable problem - keep customers informed and happy, refund in accordance with your agreement if you breach your SLA, and everyone knows where they stand.
100% uptime is complicated and expensive, and realistically most customers in most markets can cope under a 99.9% SLA. Of course, we’re only talking occasional worst-case; if it’s really 45 minutes every month, then you probably need to find a new provider. If you have a customer or product where that could be a genuine problem, add an enterprise tier with a higher availability SLA, and put the extra money towards upgrading your infrastructure.
I understand the frustration. You want to do the best possible job for your customer and you’ve made to look bad by a supplier. But the software I use crashes frequently, websites go down, my Internet connection drops. I’m used to it and I think most other people are too. Good 'ol Microsoft has trained consumers and businesses to expect crashes. ;0)
As long as your software isn’t safety critical and you haven’t corrupted or lost lots of data, I wouldn’t sweat the occasional bit of downtime.
SLA’s really serve a marketing purpose rather than anything else. An effective SLA needs three things:
- clear definitions (e.g. what ‘uptime’, does it include unexpected only or also planned maintenance, over what time period, monthly? rolling 4-week period? etc);
- an external system than can monitor this SLA in the terms you’ve defined and both parties can monitor;
- penalties which ensure compliance and fairly compensate customers for breaches of the agreement.
The first is relatively easy but time consuming and it’s the sort of thing that enterprise contract lawyers live for
The second is usually prohibitive for most companies (I’ve built these types of SLA monitoring systems for large telcos but almost no one else can justify them)
Lastly, the real problem: if your e-commerce store generates $1000/day and the site is down for an hour, should I assume that you’ve lost 1/24th of $1000 ($41), or should I give you 1/24*1/30 of your monthly $50 fee (70c)? Either way, the incentives are generally not aligned and most smart providers will never accept the liability based on your actual business value (“For 1/2 day, my 100 employees couldn’t do X. That’s ~$7k in salary costs that I want back”)
SLA’s are great for marketing, and they’re an important comforting fact for enterprise sales, but they’re basically worthless to the customer.