An even greater bottleneck can crop up in programs that can't easily be broken into uniform, parallel streams of instructions. If a processor gets more than its fair share of work, all the others may wait for it, reducing the overall performance of the machine as seen by the user. Linpack operates on the cells of matrices, and by making the matrices just the right size, users can keep every processor uniformly busy and thereby chalk up impressive performance ratings for the system overall.
"As long as we continue to focus on peak floating-point performance, we are missing the actual hard problem that is holding up a lot of science," Loft says.
But the "hard problem" is getting the attention of computer and chip makers. IBM, which makes the Blue Gene family of supercomputers, has taken a systems approach.
Rather than cobbling together commodity processors with commodity interconnects like Ethernet or InfiniBand -- an approach that others have used -- IBM built five proprietary networks inside Blue Gene, each optimized for a specific kind of work and selectable by the programmer. Members of the Blue Gene family held the No. 1 and No. 2 positions on the Top500 list until June of this year.
By making memory access faster, and by doing it more cleverly, the absolute amount of memory in a system can be reduced, says Dave Turek, vice president of Deep Computing at IBM. As engineers work to build "exascale" computers (a thousand times faster than Roadrunner), that will be essential, he says.
"Going back a few years, you'd build a computer with the fastest processors possible and the most memory possible, and life was good," Turek says. "The question is, how much memory do you need to put on an exascale system? If you want to preserve the kinds of programming models you've had to this point, you'd better have a few hundred million dollars in your pocket to pay for that memory."
And it isn't just the purchase cost of memory that's a problem, Turek notes. Memory draws a lot of expensive power and generates a lot of heat that must be removed by expensive cooling systems.
Faster memory subsystems and faster interconnects will help, Turek says, but supercomputer users will also have to overhaul the programming methods that have evolved over the past 20 years if they hope to utilize the power of exascale computers.
He says users initially criticized Blue Gene for having too little memory, but eventually they were able to scale their applications to run well on 60,000 processors by changing the algorithms in their application code so they were more sparing in their memory use.