Nvidia's Pascal GPU tech specs revealed: Full CUDA count, clock speeds, and more

When Nvidia took the wraps off the monstrous new Pascal graphics processor as part its Tesla P100 board announcement at the company’s GTC keynote on Tuesday, it provided just enough technical information to tantalize: 15 billion transistors, more than twice as many as big Maxwell GPUs! 16GB of blazing-fast, second-gen high-bandwidth memory! Nvidia’s first-ever chip built with 16nm transistors! And so on. But what the company didn’t reveal (besides news about consumer Pascal graphics cards) was nitty-gritty Pascal architecture information. You know, the stuff that graphics card geeks geek out over.

But fear not: Nvidia’s supplied all that juicy information in a supplementary Pascal tech deep dive for developers.

The details reveal some interesting nuggets. While the Pascal GP100 GPU features smaller 16nm transistors than the Titan X, which was built on 28nm technology, its die is actually roughly the same size, at 600mm squared. Pascal puts the space to more efficient use though, stuffing up to 3840 CUDA cores and 240 texture units into 64 streaming multiprocessors (SM) in its most technically capable configuration. By comparison, the most potent Maxwell GPU, found in the Titan X and Tesla M40, features 3072 CUDA cores. The version found in the Tesla P100 has 56 SMs and 3584 CUDA cores enabled.

Here’s a block diagram of the Pascal GP100 architecture overall. (Blame Nvidia for the small size, though you can click the image to enlarge it.)

pascal 100 gpu block diagram — A block diagram of Nvidia’s Pascal GPU.

And here’s a closer look at the design of each of Pascal’s streaming multiprocessors, each of which packs 64 single-precision (FP32) CUDA cores and 32 double-precision (FP64) CUDA units, which is good for 10.6 teraflops of single-precision floating-point performance and 5.3 teraflops of double-precision performance, respectively. Here’s a closer look at Pascal’s SM design:

pascal 100 gpu streaming multiprocessor — A block diagram of the Pascal GPU’s streaming multiprocessors.

Finally, here’s a full breakdown of the Tesla P100 GPU’s key tech specs, comparing it against the Maxwell-based Tesla M40 and Kepler-based Tesla K40.

All these numbers and diagrams are just the tip of the iceberg, though. Check out Nvidia’s Pascal GP100 introduction post for a far deeper dive into the new GPU’s capabilities (seriously—there’s a lot more). You’ll also want to check out PCWorld’s Pascal GPU coverage for more information about the rest of the chip, like its 16GB of second-gen HBM memory and ludicrously fast new NVLink interconnect technology. Remember: All these delicious goodies will drip down to consumer graphics cards sooner than later, with the first 16nm GeForce models expected to land later this year.