When the Skype network suffered a massive outage on August 16th, it got me to thinking about chokepoints and lockin. Which you should also consider when you're rolling out VoIP for your shop.
VoIP is a peer-to-peer protocol, just like any IP traffic, so there shouldn't be chokepoints or lockin. Of course in the real world it's always less than ideal because vendors want to lock us in by any means except, it seems, better service and prices, and someone somewhere is always wanting to raise fences and dig moats. I believe that this is a counter-productive strategy, and that openness benefits everyone, including those who must keep eyes on their bottom lines. Open standards and open networks make the pie bigger, which is better than fighting over a tiny pie. (Sorry for the horrible Dilbert-quality metaphors, but I couldn't think of anything else.)
What happened with Skype and why did it take so long to fix? A lot of people outside the United States might think this is a silly question, because in many countries they have not ever had the benefit of a first-rate reliable telephone network, so a downtime of 12-48 hours after four years of nearly stellar reliability doesn't seem all that terrible. But this particular incident highlights a number of bad things that, in my nearly-humble opinion, you should not allow on your network.
Skype says the cause of the outage was this:
"The disruption was triggered by a massive restart of our users' computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.
"The high number of restarts affected Skype's network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact."
So in effect, a chain of circumstances created a Distributed Denial of Service attack (DDoS). There are a number of problems that got us to this point.
First of all is the way Windows patches are applied. Patch Tuesday is silly. Updates, especially security patches, should be released as soon as they are ready. Then requiring a reboot to apply the patches is just plain 19th-century. Real operating systems don't require reboots for every little thing, but only major events like kernel and hardware upgrades. But as fun as it is to blame Windows, it's really not fair to give it the sole blame it in this case. Because the next question is why were all those Windows machines set to automatically log in to Skype at bootup without any user intervention? Security 101 says don't allow unattended logins, and anyway what's the point of logging in to remote services like Skype when the user isn't even present?
So now we come to chokepoints, which for Skype are their login servers. Skype has always handled this particular issue very capably, so don't think that a single outage means their whole network is poo, because it isn't.
I don't know of any way to get around this: If a service provider requires you to log in then you're going to have a limited number of gateways into their network. So then you need to ask yourself, what's my backup plan when this service goes down? For Skype users it was simply "do without." Pick up their plain old-fashioned telephones and carry on as best they could. Which I suppose isn't so terrible, but if your business relies on your Skype account that's going to hurt.
Fonality handles this elegantly:Both PBXtra and trixbox Pro use the hybrid hosting model that offloads the networking and systems management guff to Fonality's data center. If something happens and Fonality goes offline, your phone service automatically falls back to the PSTN. So you'll lose VoIP, but you won't lose all of your telephony and you won't have to make the switchover yourself.
Redundancy and failover
A perennial problem for the computer network administrator is planning for failures. How many belts and suspenders do you need? How many routers, servers, how many different Internet service providers. With telephony it's getting absurdthe legacy PSTN network, cell phones, VoIP, and text messaging. Then there's e-mail and instant messaging, and I challenge you to count all the separate, incompatible IM networks a person can collect. We need giant size business cards just to hold all of our contact information, and personal hand trucks to hold all of our gear.
Which is all a bit of exaggeration, but not that much. These are all the things the wise network administrator takes into consideration and tries to plan for.
My own approach is to invest in smaller quantities of quality hardware, rather than throwing masses of leftover and inferior hardware into the mix. Even the most die-hard do-it-yourselfer (like me) has to depend on outside service providers, and since things always break, it's smarter to plan for it. Which makes shopping for good service providers very important, because a low-cost provider can cost you more over the long run.
While we're talking about costs, don't forget to do your PSTN-VoIP comparisons. People get all excited about "free or dirt cheap long distance!" and don't really compare the numbers. The popularity of VoIP has driven down the costs of most traditional phone network services, so don't be shy about shopping around and getting some competitive bidding going.
The VoIP servers I've been covering hereAsterisk, Trixbox, and SipXallow you a tremendous amount of flexiblity in mixing and matching services and networks, so you can knit together your best deals.
Next week we'll return to our torture-testing of trixbox Pro, and see how it measures up in the real world.