Dongarra says a computer at the top of the Top500 list will typically spend six years on the list before falling off the bottom, and he doesn't expect Roadrunner's hybrid Opteron/Power/Cell architecture to stay on top for long.
"The trend is to large numbers of [processor] cores on a single die," he says. "And it looks like we'll have this one chip with different kinds of cores on it. We might have cores that specialize in floating point, ones that specialize in graphics and those that are more commodity-based." Exploiting that flexibility so the chip is, in essence, tuned for a specific application domain, such as climate modeling, will require software tools that do not yet exist, he says.
Intel is doing as Dongarra suggests -- developing specialized microprocessor cores and the software tools to exploit them. It's also responding to Loft's plea for faster memory access.
Bandwidth aside, memory will have to be more power-efficient if exascale computers are to draw reasonable amounts of power, says Steve Pawlowski, an Intel senior fellow. He says both objectives can be met in part by building bigger on-chip cache memories that act as very fast buffers between processor cores and dynamic RAM.
"If you can cache a significant number of DRAM pages, the machine thinks it's talking to flat DRAM at high speeds, and you can populate behind it much slower and more power-efficient DRAMs," he says. "You want the cache big enough to hide the [memory] latency, and you want to be clever in how you manage the pages by doing page prefetching and things like that."
He says Intel is also working on increasing the communication bandwidth of the individual pins that connect the processor chip to the memory controller. "I'd like to push the memory bandwidth to be 10 times greater than it is today by 2013 or 2014," Pawlowski says. "The engineers working for me say I'm crazy, but it's a goal."
In the meantime, Intel and others are working on one or two other possibilities -- very high-speed communication via silicon photonics (light) and "3-D die-stacking," which creates a dense sandwich of CPU and DRAM. Both technologies have been proved in labs but have not yet been shown to be economically viable for manufacturers, Pawlowski says.
Petaflops, peak performance, benchmark results, positions on a list -- "it's a little shell game that everybody plays," says NCAR's Loft. "But all we care about is the number of years of climate we can simulate in one day of wall-clock computer time. That tells you what kinds of experiments you can do." State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day.
"The idea," Loft says, "is to get an answer to a question before you forget what the question is."