USENIX: Flame graph shows system performance in a new light

A new form of data visualization created by a Joyent engineer can help administrators pinpoint CPU and memory resource hogs

Systems performance engineer Brendan Gregg

Systems performance engineer Brendan Gregg

The person who became known on the Internet for yelling at servers is now becoming famous for another, somewhat related, feat, creating a new type of data visualization for characterizing system performance.

Brendan Gregg, lead performance engineer at cloud provider Joyent, has developed a visualization technique called a flame graph that can be effective for charting how system resources such as CPUs and memory are used. It has subsequently been picked up by a number of engineers who have used it to enhance popular diagnostic tools such as DTrace and Windows XPerf.

Gregg explained how the flame graph works Thursday at the USENIX LISA (Large Installation System Administration) conference in Washington, D.C. Flame graphs could save hours of diagnostic time for system administrators, performance engineers, support staff and others trying to figure out why a system is running more slowly than expected.

"We've had stack traces for a long while, but what Brendan has done has given us a really fast way of seeing aspects that weren't easily visible before," said one attendee of the presentation, noting that flame graphs would have come in handy for him at work during a recent dispute with a software vendor over a performance issue.

The vendor might have been able to solve the problem in a few hours using a flame graph rather than the three weeks it ended up taking, he said.

Gregg's expertise lies in the area of measuring system performance. His book on the topic was published this year by Prentice Hall.

In 2008, Gregg, then an employee at Sun Microsystems, attracted attention for showing how disk I/O could be slowed by sudden loud noises, a fact he demonstrated by yelling, quite loudly, at a server. The resulting vibrations had slowed the disks.

Gregg created a YouTube video to demonstrate latency heat maps, a new type of visualization he created to chart system latency. The video went viral in the IT community.

The flame graph came about "under duress," Gregg said. A customer had voiced concern over an application that was running about 40 percent slower than expected. To investigate the problem, Gregg had to sort through 500,000 lines of diagnostic data. He quickly realized it was far too much data to easily comprehend.

Inspired by visualization guru Edward Tufte, Gregg brainstormed ways to visualize the entire data set within a single screen. What he came up with "merged and collapsed together the common elements," while preserving the relation among the elements in the amount of resources they consumed.

Flame graphs are composed of multiple stacks of vertical bars, with each row of bars representing a slice of time, the rows on the bottom being the oldest and the ones on the top of the graph being the newest. Each row might have multiple bars, with each bar representing a different function, and the length of each bar representing the percentage of resources that the function is using at that time.

For a flame graph representing CPU usage, the top bars show what software functions were being executed at the time the data was captured.

CPU flame graphs are built on stack traces, which list all the functions being executed by the CPU at any given time. But the flame graph's hierarchical presentation of the data encapsulates the flow of actions on a processor.

Examining a graph, an administrator can visually trace which functions are called by other functions. Scanning across different rows can reveal which functions of an individual program, or at a higher level which of a number of concurrently running programs on a machine, are gobbling up a disproportionate amount of the CPU's attention.

Other flame graphs can be constructed to show how resources are being divided up in memory or with disk I/O.

The program Gregg created to render flame graphs consists of about 300 lines of Perl code for interpreting the source data, along with a few SVG (Scalable Vector Graphics) functions for rendering the graphs and some JavaScript to add mouse-over capabilities to the Web interface.

Others have built programs that use flame graphs to visualize data created by popular performance tools, such as DTrace, Windows XPerf, OS X Instruments, Perl performance tools and Google Chrome Developer Tools.

Gregg said that Dave Pacheco's node.js implementation for DTrace may even become the canonical flame-graph application, given that it is more advanced than Gregg's own program.

Beyond flame graphs, Gregg is working on another visualization called frequency trails, an R-based data rendering that shows the characteristics of the outliers in a set of data, which can be useful in determining severe performance issues in cloud computing operations, he said.

Gregg is not a visual person by nature, he said in an interview after his presentation. He is most comfortable with the Unix command line. But the very nature of today's large distributed systems demands visual aids.

"On a cloud, I need to understand 1,000 servers and I need to understand them right now. Visualization is necessary to do our jobs these days," he said.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags softwareapplicationsSun MicrosystemsJoyentsystem managementdata miningUtilities

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

Breitling Superocean Heritage Chronographe 44

Learn more >

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?