Computer hardware giant Intel is also one of the largest software developers in the world, employing more than 6,000 software professionals. In December, the company formed a new position -- that of Intel senior fellow -- at the top of its research hierarchy and appointed four people to the role, including Justin R. Rattner, director of microprocessor research, and Richard Wirt, general manager of software and solutions.
Recently, Rattner and Wirt spoke about what's coming in the software realm.
Q: What's new in compilers?
Wirt: We see activity in traditional compilers that adapt programming languages better for multithreading and hyperthreading. OpenMP, an initiative to adapt programming languages to handle threading, is a good example.
Rattner: Today's instruction sets were really designed for static compilers, so the trade-offs they make are in favor of static compilers. When we move to dynamic compilers [like Java and .Net], we can continue to optimize even while the program is executing. The optimizing compiler is querying the hardware on a periodic basis and saying, "How's the program running?"
But such performance monitors aren't new.
Rattner: Today, performance monitors are really designed for debugging, and they are inaccessible to the compiler. What we are definitely looking at in the future is program-visible instrumentation so the compiler has access to [runtime conditions]. This is on the fly; this is the compiler in the loop. This is where our heads are at.
Will we see more parallel processing of various types?
Wirt: We went through getting computers to parallelize the instructions on a single [processor]. Intel pushed that to get about as much as we can get, so now we are beginning to go threaded on single [processors]. Then you'll see us take multiple [threaded processors] and put them on a motherboard.
As we add more transistors, then, instead of multiple [processors] on the motherboard, we'll put them on the die, on the chip itself. We refer to that as dual-core. Then you want to string these things together in big clusters. Each node gets more powerful as driven by Moore's Law, but we will string more and more of these together to form a supercomputer.
How will you get more parallelism out of existing applications?
Rattner: We've discovered you can create "helper threads" when certain situations arise. A set of helper threads created by the compiler can run ahead of the main thread in order to bring normally missing data into the cache ahead of the time the main thread will need it.
We now have an experimental version of our production compiler that will automatically generate helper threads.
How much help will the helper threads be?
Rattner: We are finding all kinds of clever ways to use them. We've seen two- to four-times improvements on some applications. On average, we'd expect to see 1.3 to 1.6, and some programs will do amazingly well.
Over the years, many of the promises of parallel processing have been dashed because the software is so difficult to develop.
Wirt: The problem was, people expected the compiler to do it for them, and there was a lot of research there. Then they came to the conclusion that you still needed programmers' help, and that's why they invented OpenMP with those "pragmas," or hints that the programmer puts in the application so the compiler could parallelize [the code].
So as you go across nodes, with MPI [the Message Passing Interface protocol], you break the code into functional blocks, one function on one node. The programmer does that. But now it's at a higher level of abstraction that the programmer can understand.
That's geared for scientific computing. How about commercial applications?
Wirt: The same thing's going on -- breaking up the application into functions and having them talk to each other. Web services is a good example. Think of SOAP doing for the business world what MPI does for the technical world -- having objects talk to each other in order to give you scalability across a cluster or distributed network.
Where is this headed, and what's needed to get there?
Wirt: Typically, there are tens of objects working together in a business app, but in the technical world, people are building clusters with 10,000 nodes. What's needed are debugging and performance-tuning tools, and tools that allow you to look at all those nodes that are cooperating.
OpenMP (multiprocessing). A specification for a set of compiler directives and library routines that can be used to specify shared-memory parallelism in Fortran and C/C++ programs.
Message Passing Interface (MPI). A standard that facilitates the development of parallel applications. It defines the syntax and semantics for library routines for portable message-passing programs in Fortran or C/C++.
Hyperthreading. Intel's way to make a single physical processor appear to the operating system or multithreaded user program as two logical processors.
Pragma. A way for the programmer to tell the compiler to do something "pragmatic" at the point in the program where the pragma appears. It might say to use a certain library or generate a certain kind of code for parallel processing.
Helper threads. Small strings of instructions that help the main application thread perform better. Generated by the compiler, they run ahead of the main thread and can, for example, bring data into the cache in advance so it's available when needed.