USENIX: Flame graph shows system performance in a new light

A new form of data visualization created by a Joyent engineer can help administrators pinpoint CPU and memory resource hogs

Systems performance engineer Brendan Gregg

Systems performance engineer Brendan Gregg

The person who became known on the Internet for yelling at servers is now becoming famous for another, somewhat related, feat, creating a new type of data visualization for characterizing system performance.

Brendan Gregg, lead performance engineer at cloud provider Joyent, has developed a visualization technique called a flame graph that can be effective for charting how system resources such as CPUs and memory are used. It has subsequently been picked up by a number of engineers who have used it to enhance popular diagnostic tools such as DTrace and Windows XPerf.

Gregg explained how the flame graph works Thursday at the USENIX LISA (Large Installation System Administration) conference in Washington, D.C. Flame graphs could save hours of diagnostic time for system administrators, performance engineers, support staff and others trying to figure out why a system is running more slowly than expected.

"We've had stack traces for a long while, but what Brendan has done has given us a really fast way of seeing aspects that weren't easily visible before," said one attendee of the presentation, noting that flame graphs would have come in handy for him at work during a recent dispute with a software vendor over a performance issue.

The vendor might have been able to solve the problem in a few hours using a flame graph rather than the three weeks it ended up taking, he said.

Gregg's expertise lies in the area of measuring system performance. His book on the topic was published this year by Prentice Hall.

In 2008, Gregg, then an employee at Sun Microsystems, attracted attention for showing how disk I/O could be slowed by sudden loud noises, a fact he demonstrated by yelling, quite loudly, at a server. The resulting vibrations had slowed the disks.

Gregg created a YouTube video to demonstrate latency heat maps, a new type of visualization he created to chart system latency. The video went viral in the IT community.

The flame graph came about "under duress," Gregg said. A customer had voiced concern over an application that was running about 40 percent slower than expected. To investigate the problem, Gregg had to sort through 500,000 lines of diagnostic data. He quickly realized it was far too much data to easily comprehend.

Inspired by visualization guru Edward Tufte, Gregg brainstormed ways to visualize the entire data set within a single screen. What he came up with "merged and collapsed together the common elements," while preserving the relation among the elements in the amount of resources they consumed.

Flame graphs are composed of multiple stacks of vertical bars, with each row of bars representing a slice of time, the rows on the bottom being the oldest and the ones on the top of the graph being the newest. Each row might have multiple bars, with each bar representing a different function, and the length of each bar representing the percentage of resources that the function is using at that time.

For a flame graph representing CPU usage, the top bars show what software functions were being executed at the time the data was captured.

CPU flame graphs are built on stack traces, which list all the functions being executed by the CPU at any given time. But the flame graph's hierarchical presentation of the data encapsulates the flow of actions on a processor.

Examining a graph, an administrator can visually trace which functions are called by other functions. Scanning across different rows can reveal which functions of an individual program, or at a higher level which of a number of concurrently running programs on a machine, are gobbling up a disproportionate amount of the CPU's attention.

Other flame graphs can be constructed to show how resources are being divided up in memory or with disk I/O.

The program Gregg created to render flame graphs consists of about 300 lines of Perl code for interpreting the source data, along with a few SVG (Scalable Vector Graphics) functions for rendering the graphs and some JavaScript to add mouse-over capabilities to the Web interface.

Others have built programs that use flame graphs to visualize data created by popular performance tools, such as DTrace, Windows XPerf, OS X Instruments, Perl performance tools and Google Chrome Developer Tools.

Gregg said that Dave Pacheco's node.js implementation for DTrace may even become the canonical flame-graph application, given that it is more advanced than Gregg's own program.

Beyond flame graphs, Gregg is working on another visualization called frequency trails, an R-based data rendering that shows the characteristics of the outliers in a set of data, which can be useful in determining severe performance issues in cloud computing operations, he said.

Gregg is not a visual person by nature, he said in an interview after his presentation. He is most comfortable with the Unix command line. But the very nature of today's large distributed systems demands visual aids.

"On a cloud, I need to understand 1,000 servers and I need to understand them right now. Visualization is necessary to do our jobs these days," he said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is

Join the PC World newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags applicationsUtilitiesSun Microsystemsdata miningsoftwareJoyentsystem management

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Armand Abogado

HP OfficeJet 250 Mobile Printer

Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?