RIM explains BlackBerry crash; questions remain

The BlackBerry network failure in the US last week was caused by a small bit of new code and a still unexplained problem in the network's failover process.

Research in Motion, which runs the BlackBerry service through its Canadian network operations center, says it has ruled out security and capacity issues, and hardware or software infrastructure failures, as the cause of the outage, which blocked e-mail service to subscribers in the Americas.

RIM sent an e-mail describing the cause of the outage late Thursday night, Eastern Time, its first detailed communication since the BlackBerry service was disrupted Tuesday evening. The e-mail is a model of managed communication. It only once acknowledges a "problem," and only once uses the word "failure," and then only in eliminating a potential cause.

Just hours after the e-mail was sent, RIM and T-Mobile unveiled the debut of the "performance-driven" BlackBerry 8800 on the carrier's cell net. The BlackBerry 8800 allows users to "stay connected and productive while on the go," according to the joint press release.

But when RIM's own net failed, users of the 8800, and most of the other BlackBerry handsets, were left with little more than a rather expensive alarm clock or thumb-powered game console for the Brickbreaker game.

BlackBerry subscribers noticed the disruption when the usual stream of mobile e-mails dried up Tuesday evening. IT managers scrambled to figure out if the problem was related to RIM's enterprise server software, their wireless carrier, or the RIM operations center.

According to RIM, its IT staff Tuesday deployed a new system routine that was intended to better optimize the system's caching. The new code was not a critical routine and "was expected to be non-impacting with respect to the real-time operation of the BlackBerry infrastructure."

RIM has concluded that the pre-testing of the new code was "insufficient."

The new routine didn't behave according to plan, apparently: It "triggered a compounding series of interaction errors between the system's operational database and the cache."

Troubleshooters at the NOC identified the problem and tried to correct it. When those measures failed, and RIM has not given any time frame for this whole process, the NOC staff began a well-rehearsed failover process to a backup system.

And that process unexpectedly ran into problems. The RIM e-mail says the backup procedure has been "repeatedly and successfully tested previously." But, for reasons RIM has not yet explained, this time the process "did not fully perform to RIM's expectations." This second problem caused further delays in restoring service and processing the backlog of messages.

"RIM apologizes to customers for inconvenience resulting from the service interruption," according to the e-mail. The company is continuing its analysis of what happened and of what changes to make to minimize the chances of it happening again. The e-mail says that RIM has identified "certain aspects of its testing, monitoring, and recovery processes that will be enhanced" as a result of the failure.

The apology may not be enough for some enterprise users. A surprising number say they've never been contacted by RIM at all during the outage.

"RIM did NOT contact us... Before, during or after the outage," says Rich De Brino, CIO and vice president for Advances in Technology, the technology arm of Compass HealthCare, in Everett, Wash. "We learned [about] it from Slashdot first and then other internet sites, but never RIM."

As of this writing [9 a.m. ET], there is still no word of the outage posted at www.rim.com or www.blackberry.com .

The health provider has about 100 BlackBerry users, with Exchange as the corporate e-mail server. De Brino sent his comments via his BlackBerry.

"We've come to the conclusion that we should seriously re-evaluate using a service (like BlackBerry) that we don't control and once again consider using something like Microsoft Exchange Server and Windows Mobile instead," De Brino says. He acknowledges that such an alternative would only be as reliable as their own Exchange server and wireless connection. "But ours is pretty reliable: no down time last 24 months. So I like it," he says.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

John Cox

Network World
Show Comments

Most Popular Reviews

Latest News Articles


PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?