Taking advantage of multicore PCs

What app developers need to know to make their software work on new-gen CPUs

Call it the great multicore discord: a parade of major hardware and software vendors promising desktop applications powered by multicore chips yet all marching out of step, leaving confused software developers in the dust -- but times are changing.

Far out front, chipmakers Intel and AMD have delivered quad-core chips for desktop computers earlier this year. And computers with dual-core chips are now the norm. But only the savviest of developers can harness this massive processing power by weaving a mind-bending web of code that foundational software vendors should have provided. So much of the multicore chips' processing power is unharnessed.

Software vendors are finally closing the gap: Microsoft, Apple, third-party platform vendors, and software developer consortiums are tweaking everything from the operating system schedulers to APIs to languages and libraries to make them multicore-friendly. The goal, of course, is to make it easier for developers to join the multicore movement.

There's no question that the pace is quickening, the gap closing. Apple, for instance, claimed earlier this month that its upcoming Mac OS X Snow Leopard will boast a new technology, code-named Grand Central, that supports multicore chips, along with developer tools that let applications leverage up to eight cores of processing power.

To take advantage of multicore-enabling technologies such as Grand Central and whatever Microsoft may be working on for Windows 7 (the company declined to comment), developers must move up a steep learning curve in areas such as multithreading, parallel, and concurrent.

The first efforts to tap into multiple cores

Getting a desktop application today to truly take advantage of multicore chips is akin to Olympian gymnastics in that only a few elite developers are capable of performing at the level required. "The emphasis is on elite," says Dave Lounsbery, vice president of collaboration services at the Open Group, an industry group that promotes standards and industry-wide best practices. "This elite developer is able to go in and directly interact with the thread libraries and other things. He'd know that if he called the XYZ graphic function, it wasn't safe, and therefore, he'd be going from multicore to one core."

But most developers aren't Olympians, and so need development tools and operating systems to handle much of the multicore effort for them.

Software vendors are beginning to rise to the challenge, according to a Forrester Research report, by delivering low-level language extensions and libraries. For example, RapidMind offers a software development platform that allows developers to exploit quad-core AMD Opteron and Intel Xeon processors, which are typically used in servers. Graphics processor developer Nvidia provides its developers with a parallel programming language and libraries, called Cuda, to help developers access the graphical processor in a PC. But neither approach addresses the broad challenge: unlocking the multicore chips on desktop PCs and Macs for a broad range of applications.

However, some efforts are under way to make the multiple cores of today's PCs available to app developers. Apple's Grand Central is meant to help Mac developers do so. The European Commission said it will spend nearly 5 million euros to underwrite a project for building a framework -- APIs, OS extensions, and language support -- for Java-based, real-time applications on multicore chips called the Java Environment for Parallel Real-time Development, or Jeopard. (The Open Group is one of the organizations partaking in the project.)

Page Break

Recently, the Khronos Group, an industry consortium, formed the Compute Working Group to come up with a royalty-free standard for building multicore applications. Although the group has focused mainly on industry standards for tapping into graphical processors for better graphics and rich-media display, its latest effort seeks to tap into both graphics processors (which tend to have multiple cores) and multicore CPUs as a virtual set of coordinated processing resources that can be used for any purpose. The standard, which may come to market in as little as six months, "will take care of a lot of the architectural decisions and automatic mapping across the resources," says Neil Trevett, president of the Khronos Group and vice president of mobile content at Nvidia.

Trevett says there's a good chance the Compute Working Group will embrace Apple's Open Computing Language (OpenCL) specification, which aims to let app developers tap directly into graphics processors. Mac OS X Snow Leopard will support OpenCL, which is based on the C programming language.

To help .Net developers learn about the underlying parallel programming techniques requires to tape multiple cores, Microsoft launched the Parallel Computing Initiative, which includes a concurrency runtime, a technology preview for parallel extensions to the .Net Framework 3.5, domain-specific libraries, packaging application services to allow rich virtualization experiences, and profiling tools.

Forrester expects more choices to hit the market over the next two years.

Not clear where developers should focus

Developers must be cautious when choosing a particular style of parallel programming, lest they find themselves in a dead end two or three years down the line and have to rewrite their applications, says Krste Asanovic, an associate professor at the University of California at Berkeley. "There are a bunch of possibilities -- programming environments to use -- out there, but you don't know which one will out in the marketplace," he says.

So where should developers place their bets?

For simple data parallel constructs, such as a filter for Adobe Photoshop or for an audio file, Asanovic recommends looking at languages such as Cuda, CT from Intel, and OpenCL. "Most of this is tied to GPUs [graphical processors] but is going to work on multicore CPUs as well," he says. "If I was a programmer who wanted to dip my feet in the water, I'd look at data parallel programming and figure out if my app maps to this," he says.

This means data-parallel applications that crunch images, graphics, and video, as well as applications that analyze business data, will likely be the first to benefit from the multicore movement. That's because such tasks can be broken down into chunks that can be processed simultaneously, with chunks being fed into whatever cores are available and the results then stitched together at the end. For example, interpolating pixels or scrubbing mailing addresses can be broken into discrete units this way and handled in parallel.

But most tasks don't map that naturally to parallel processing. Instead, they have dependencies to the results of other calculations that cause one core to idle while waiting for the other core to finish. In these cases, developers should look for natural divisions across tasks, running independent tasks on different processors in what Asanovi calls a "task farm." For example, in a word processor, one core could handle print spooling while another runs a spell-check, as the two tasks are unrelated. The trick is knowing that they have no interdependencies before sending each task to its own core.

Page Break

Clearly, applications designed from the get-go to be multithreaded have a better chance of taking advantage of multicore processors, because each thread can theoretically be handled by whatever core is available. In reality, though, many threads have interdependencies, which bog down the cores. That's why Forrester advises developers to "write code that can automatically fork multiple simultaneous threads of execution as well as manage thread assignments, synchronize parallel work, and manage shared data to prevent concurrency issues associated with multithreaded code."

The Open Group's Lounsbery also advises developers to simplify how they write software. "With a lot of hand-coded algorithms, you're not going to get any advantage from improvements in the operating systems or languages," he says. "If you've got a lot of legacy code or handwritten code that takes advantage of a specific operating system's features, you really have to start thinking about encapsulating those and maybe start moving them into areas that are supported natively."

Many developers tend to write complex applications because software architectures constantly change, thus muddying an application's evolution. Also, performance issues force developers to "push things together in perhaps unclean ways to reach performance goals," Asanovic says.

In the world of multicore processors, these entanglements become strong barriers to performance. So Asanovic advises developers to disentangle computing-intensive parts of an application from those that aren't so demanding and to separate the user interface from the application code. If developers can draw their software's processing flow on a white board, they can see where threads and tasks can be parallelized, he says.

"This is all good standard engineering practice, and it's going to be even more valuable as you start to use parallel processors," Asanovic says.