The Skype mystery: Why blame the August Windows updates?
Mon, 2007-08-20 22:54
Topic(s):

Skype finally came out for an explanation of last week's. 36-hour network outage: The August Windows update caused a whole bunch of Skype clients to reboot, exposing a bug in the company's peer to peer network.

What I don't get though, is why didn't this happen in July? Microsoft puts these updates out every month, so why'd the crash happen now?

Like me, Internet Storm Center handler John Bambenek doesn't think Skype is doing a very good job of explaining what happened, so I asked John what questions put to Skype. His questions and Skype's answers are below.

Warning, if you're hoping for a straight answer on any of this, you're going to be disappointed. These answers come from Jennifer Caukin, a Skype spokeswoman. To her credit, she warned me first that there's nobody in the US who can answer questions in any detail today. Maybe by tomorrow we'll get some real answers.

Q -- Why did it take a full 24 hours after patching and rebooting for the
outage to occur?
A:   The disruption was triggered by a massive restart of our user's
computers across the globe within a very short timeframe as they
re-booted after receiving a routine set of patches via Windows Update.
The high number of restarts affected Skype's network resources. This
caused a flood of log-in requests, which, combined with the lack of
peer-to-peer network resources, prompted a chain reaction that had a
critical impact.  The 36 hours required to get the network back up was
due to the time needed to get the proper number of available
peer-to-peer network resources up and running.

OK I don't think she quite got this question. Maybe Skype can explain why the outage didn't start on Tuesday or Wednesday, when Microsoft's patches were released.


Q -- With the reboots distributed across many timezones, how did the end up
buckling your capacity?
Why didn't it happen last month too (and months prior)?
A:   Normally Skype's peer-to-peer network has an inbuilt ability to self-heal, however, the day's traffic patterns combined with the large number of reboots revealed a previously unseen fault in the network resource allocation algorithm Skype uses.  Consequently, the peer-to-peer network's self-healing function didn't work quickly enough. Regrettably, and as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.

Q -- How do you know it wasn't a DoS?
A: The issue has now been identified explicitly within Skype. We can confirm categorically that no malicious activities were attributed or that our users' security was not, at any point, at risk.

Q -- Has Microsoft been contacted and what is there take on the situation?
A:  Yes they have been contacted. 

Microsoft told me that they didn't do anything different with their updates in August (they've blogged about the issue here). So why did this release kick off the problem? Nobody is saying.

Q - What are the details of the bug that they fixed?  Was it a result of
something added recently?
A: The "abnormality" occurred in Skype software.  To clarify: Skype's peer-to-peer core was not properly tuned to cope with the load and core size changes that occurred on 16th August. The reboots resulting from software patching merely served as a catalyst. This combination of factors created a situation where the self-healing needed outside intervention by our engineers.

What are your plans to avoid similar capacity problems?
A:  This disruption was unprecedented in terms of its impact and scope. We would like to point out that very few technologies or communications networks today are guaranteed to operate without interruptions. We are very proud that over the four years of its operation, Skype has provided a technically resilient communications tool to millions of people worldwide. Skype has now identified and already introduced a number of improvements to its software to ensure that our users will not be similarly affected in the unlikely possibility of this combination of events recurring.

More comment on the thinness of Skype's explanation can be found here and here.

--Robert McMillan

Ads by TechWords
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
* Denotes a required field
VIRTUAL CONFERENCE
Data Center Directions Virtual Conference

Data Center VCAttend this free, 100% online event exploring tools and techniques for making your data center deliver for today and tomorrow.

» Learn more and register here

WHITE PAPER
Maximizing Site Visitor Trust Using Extended Validation SSL

VeriSignNow with Extended Validation (EV) SSL available from VeriSign, you can show your customers that they can trust your site. Learn about EV SSL benefits in the free VeriSign white paper.

» Read the Paper

Sponsored Links

Manage your IT more effectively

Efficient - Flexible - Compliant

Secure your virtual and physical environments with the same software

E-LOAN Maintains Reputation as a Privacy Leader with Symantec

Data Loss Prevention: Keeping Sensitive Data Out of the Wrong Hands

Prudential Financial Protects its Brand with Symantec

Envision Identity-Based Access Control for the Datacenter

Digital Identity Protection and Data Security Get Personal

Welcome to the age of Service-Oriented Security (SOS)

When Customer Relationship is Everything, Businesses Bank on SSL Solutions

Everything Today's CISO Needs to Know About Using SSO to Succeed in the Web 2.0 Era

The Case for Business Software Assurance ~ Securing Your Applications

Maximizing Site Visitor Trust Using Extended Validation SSL

Solving Online Credit Fraud Using Device Reputation

Understanding Data Location is Imperative for Data Loss Prevention

5 Steps to Secure Outsourced Application Development

CA's IT Security centralizes your identity management to turn security into a proactive, business-building tool

Simplify your data center with Juniper Networks. View the webcast

Any company can promise identity protection. Only Debix can prove it

7 Requirements of Data Loss Prevention

Information Security: Data Drains and How to Prevent Loss

How Are Open Source Development Communities Embracing Security Best Practices?

IDC Defines an Identity and Access Management Submarket

Using Likewise to Comply with PCI Data Security Standard

IDC Defines an Identity and Access Management Submarket for Managing Privileged User Accounts and Meeting GRC Requirements

Enabling Compliance with Converged Mainframe Security and Storage

Managing SSL Security in Multi-Server Environments

The Latest Advancements in SSL Technology

How to Offer the Strongest SSL Encryption

Forrester Total Economic Impact (TEI) report: Save Millions in Fraud Losses.

Get in Compliance With Government Data Regulations

Taking the Botnet Threat Seriously