Google says its AI chips smoke CPUs, GPUs in performance tests

The TPUs are faster at neural net inference, and excel at performance per watt

Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services.

Rather than buy a bunch of new real estate and servers just for that purpose, the company embarked on a journey to create dedicated hardware for running machine- learning applications like voice recognition.

The result was the Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks. Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed.

A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

Driving this sort of performance increase is important for Google, considering the company’s emphasis on building machine learning applications. The gains validate the company’s focus on building machine learning hardware at a time when it’s harder to get massive performance boosts from traditional silicon.

This is more than just an academic exercise. Google has used TPUs in its data centers since 2015 and they’ve been put to use improving the performance of applications including translation and image recognition. The TPUs are particularly useful when it comes to energy efficiency, which is an important metric related to the cost of using hardware at massive scale.

One of the other key metrics for Google’s purposes is latency, which is where the TPUs excel compared to other silicon options. Norm Jouppi, a distinguished hardware engineer at Google, said that machine learning systems need to respond quickly in order to provide a good user experience.

“The point is, the internet takes time, so if you’re using an internet-based server, it takes time to get from your device to the cloud, it takes time to get back,” Jouppi said. “Networking and various things in the cloud — in the data center — they takes some time. So that doesn’t leave a lot of [time] if you want near-instantaneous responses.”

Google tested the chips on six different neural network inference applications, representing 95 percent of all such applications in Google’s data centers. The applications tested include DeepMind AlphaGo, the system that defeated Lee Sedol at Go in a five-game match last year.

The company tested the TPUs against hardware that was released around roughly the same time to try and get an apples-to-apples performance comparison. It's possible that newer hardware would at least narrow the performance gap.

There’s still room for TPUs to improve, too. Using the GDDR5 memory that’s present in an Nvidia K80 GPU with the TPU should provide a performance improvement over the existing configuration that Google tested. According to the company’s research, the performance of several applications was constrained by memory bandwidth.

Furthermore, the authors of Google’s paper claim that there’s room for additional software optimization to increase performance. The authors called out one of the tested convolutional neural network applications (referred to in the paper as CNN1) as a candidate. However, because of existing performance gains from the use of TPUs, it’s not clear if those optimizations will take place.

While neural networks mimic the way neurons transmit information in humans, CNNs are modeled specifically on how the brain processes visual information.

“As CNN1 currently runs more than 70 times faster on the TPU than the CPU, the CNN1 developers are already very happy, so it’s not clear whether or when such optimizations would be performed,” the authors wrote.

TPUs are what’s known in chip lingo as an application-specific integrated circuit (ASIC). They’re custom silicon built for one task, with an instruction set hard-coded into the chip itself. Jouppi said that he wasn’t overly concerned by that, and pointed out that the TPUs are flexible enough to handle changes in machine learning models.

“It’s not like it was designed for one model, and if someone comes up with a new model, we’d have to junk our chips or anything like that,” he said.

Google isn’t the only company focused on using dedicated hardware for machine learning. Jouppi said that he knows of several startups working in the space, and Microsoft has deployed a fleet of field-programmable gate arrays in its data centers to accelerate networking and machine learning applications.

Join the PC World newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags Google

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Blair Hanley Frank

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Armand Abogado

HP OfficeJet 250 Mobile Printer

Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?