Sun researchers: Computers do bad math

On Feb. 25, 1991, during the first Gulf War, a Scud missile hit U.S. Army barracks in Dhahran, Saudi Arabia, killing 28 U.S. soldiers. The barracks was defended by a Patriot missile defense system, which for some reason failed to track and intercept the incoming Scud. A year later, a U.S. General Accounting Office (GAO) investigation into the Patriot's failure concluded that the battery's weapon control system suffered from a fatal flaw: It was bad at math.

On Tuesday, researchers at Sun Microsystems discussed work they are doing, as part of a three-year, US$50 million Defense Advanced Research Projects Agency (DARPA) grant, that aims to avoid the kind of errors that caused the Patriot failure.

Mathematical errors are far more common in the computer industry than most people realize, said Greg Papadopoulos, Sun's executive vice president and chief technology officer. While his company is normally the first to accuse Microsoft of shoddy operating system design, bad math and not Windows is sometimes behind those unexplained PC crashes, he admits.

"There are a lot of errors that happen in machines that go undetected," Papadopoulos said. "Sometimes a machine just goes away and freezes. You always blame it on Microsoft. We do, too. It's convenient. It's convenient for Intel, too."

"It's a dirty secret. Floating-point arithmetic is wrong," said John Gustafson, a principal investigator with Sun. "It only takes two operations to see that computers make mistakes with fractions."

The problem that Gustafson and Papadopoulos referred to stems from the fact that the binary mathematics employed by computers has a hard time accurately representing certain numbers. Fractions, for example, are particularly tough, because they often involve non-terminating numbers that are impossible to accurately express in binary format.

Dividing two by three on a calculator illustrates the problem. The fraction 2/3, when represented in a computer, is inevitably rounded up, making the last digit a seven.

In the case of the Gulf War incident, the Patriot battery's computer rounded a similar, non-terminating number in order to calculate time. But by shaving off a few digits during every calculation, the battery also shaved off a bit of time. After one hour, the Patriot's clock was off by .0034 seconds. On Feb 25, the computer had been in operation for 100 hours straight, and its clock was off by over one third of a second, enough to cause it to miss the incoming Scud.

Programmers who write software that requires these types of calculations are "very much aware of these problems," and use a variety of techniques to work around the inaccuracy, said Nathan Brookwood a principal analyst with the firm Insight64.

But with supercomputers that calculate billions of sums per second, some of these workarounds can slow down performance, and the risk that some unanticipated mathematical error may occur remains a niggling doubt.

Sun researchers are looking to solve both problems using a technique called interval arithmetic, which essentially traps a mathematically incorrect number between two other numbers that are known to be correct, and prevents mathematical inaccuracy from ballooning out of control over time. "If you can prove mathematically that the right answer is between this answer and that answer, you can restore mathematical rigor to computing," Gustafson said.

A number of compilers already support interval arithmetic, but Sun's work aims to speed up the performance of interval calculations, according to Gustafson. "We've done a lot of clever things in the compiler so the penalty for using intervals is as low as it can be," he said.

Sun's interval arithmetic work will be used in a prototype of a new supercomputer the company plans to build over the next two-and-a-half years as part of its DARPA grant.

In August, Sun created a new High Productivity Computing Solutions group to build the prototype. The group, run by the former head of the SunLabs research and development group, Jim Mitchell, has a staff of about 100 people, including four of Sun's most distinguished researchers, called Sun Fellows.

Sun's supercomputer, code named Hero because "it's a machine that scales to heroic proportions," according to Gustafson would be approximately 50 times faster than the fastest supercomputer today, the Earth Simulator in Yokohama, Japan.

The Earth Simulator can perform over 40 trillion mathematical operations per second. If Sun is able to secure a further round of DARPA funding and take Hero beyond the prototype stage, it will be capable of 50 times as many operations per second, Sun predicts.

Sun, along with IBM and Cray were awarded DARPA grants in July of this year to build prototypes of the next generation of supercomputers. Once the prototypes are built, the government agency plans to award further grants so that as many as two working supercomputers can be built.

Mitchell's team is working on an assortment of new technologies, including the interval arithmetic work and a new software development language for supercomputers. But the success or failure of Sun's bid to convince DARPA to go beyond prototype may ultimately depend on a new technique for inter-chip communication called "proximity interconnect," which was first announced by researchers in Mitchell's group at the Institute of Electrical and Electronics Engineers Inc.'s Custom Integrated Circuits Conference last September.

Proximity interconnect uses an electrical phenomenon called capacitance to transfer data between chips without using the pins and wires normally found on a computer's circuit board. If Sun is able to develop a viable manufacturing process based on the technique, it could have "a profound impact on the way people design computers," said Insight64's Brookwood.

It could make computers cheaper to build. Between one half and one third of the transistors in today's microprocessors are used for memory, Brookwood said. Engineers would prefer to place that memory in an inexpensive SRAM (static RAM) chip next to the microprocessor, but the amount of time it takes to transfer data from one chip to the other is prohibitive. "What Sun is saying with the proximity technology is there's no time penalty for going from one chip to the next," he said. "That's mind boggling if they can make that work in a general way."

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Robert McMillan

IDG News Service
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?