Following their march from standard processors to dual-core and quad-core designs in 2006, Intel researchers have built an 80-core chip that performs more than a teraflops (trillions of floating point operations per second) while using less electricity than a modern desktop PC chip.
First described by Intel executives at a September trade show, the chip fits 80 cores onto a 275-square millimetre, fingernail-size chip and draws only 62W of power -- less than many modern desktop chips.
The company had no plans to bring this "teraflops research chip" to market, but was using it to test new technologies such as high-bandwidth interconnects, energy management techniques, and a tile design method to build multicore chips, director of Intel's tera-scale research program, Jerry Bautista, said.
He spoke in a conference call with reporters last Friday before presenting technical details of the research at the ISSCC (Integrated Solid State Circuits Conference) trade show in San Francisco.
Intel engineers are also using the chip to explore new forms of tera-scale computing, in which future users could process terabytes of data on their desktops to perform real-time speech recognition, multimedia data mining, photo-realistic gaming and artificial intelligence.
Until now, that degree of computing performance has been available only to scientists and academics using machines like ASCI Red, the teraflops supercomputer built by Intel and its partners in 1996 for US government researchers at Sandia National Labs. That system handled a comparable amount of computing as the new chip, but demanded an enormous 500KW of power and 500KW of cooling to run its nearly 10,000 Pentium Pro chips.
Shrunk onto a single chip, that power would allow average consumers to use their PCs in new ways. They could use improved search functions on the vast amounts of digital media stored on home desktops, searching large photo archives for specific attributes such as all the shots where a certain person was smiling, or where that person was posing with a friend, Bautista said.
Running at 3.16GHz, the new chip achieves 1.01 teraflops of computation - an efficiency of 16 gigaflops per watt. It can run even faster, but loses efficiency at higher speeds, performing at 1.63 teraflops at 5.1GHz and 1.81 teraflops at 5.7GHz.
The processor saves power by shunting idle cores into sleep mode, then instantly turning them on as they're needed. Each modular tile has its own router built alongside the core, creating a "network on a chip".
Despite using such an efficient grid, the researchers found they could actually hurt performance by adding too many cores. Performance scaled up directly from two cores to four, eight and 16. But they found that computing performance began to drop with 32 and 64 cores.
"If we simply added more than 16 cores, we would get diminishing returns, because the threads and data traffic would not be used properly, so the cores get in the way of each other. It's like having too many cooks in the kitchen," Bautista said.
To solve the problem on the new chip, they used a hardware-based thread scheduler and faster on-chip memory caches, optimizing the way data flows from memory into each core. To improve the design, Intel researchers plan to add a layer of 3D stacked memory under the chip to minimise the time and power required to feed the cores with data. Next, they will create a mega-chip that uses general purpose cores instead of the floating-point units used in the current design.