5 things you need to know about data exhaust

Pay close attention or your data lake may turn into a data swamp

Big data is now a familiar term in most of the business world, and companies large and small are scrambling to take advantage of it. Data exhaust, on the other hand, is less widely known, and in some ways it's an evil twin brother. Here are five things you should understand about data exhaust's pros and cons.

1. It's essentially all the big data that isn't core to your business.

The "data exhaust" term has actually been around for more than a decade, and it arose with the new streams of data coming from smartphones, said Tye Rattenbury, director of data science and solutions engineering at Trifacta, which makes software for data preparation. Today, more accessible data tools are bringing exhaust to the fore.

If big data is "primary" data that relates to the core function of your business, data exhaust is secondary data, or everything else that's created along the way, Rattenbury explained.

For instance, a bank would consider primary all the data about debits and credits to its customers' accounts. Secondary data might include information like what percentage of customers' transactions are done at an ATM instead of a physical branch.

There are no standard definitions or schemas for data exhaust, which tends to be raw and unstructured, but in many ways, it's equivalent to the byproducts associated with a company's machines and core online activities. It can include streams coming in from Web browsers, plug-ins, log files, Internet of Things (IoT) devices, and more.

2. It's typically bigger than 'big.'

The term "big data" is itself a relative term, boiling down essentially to "anything that's so large that you couldn't manually inspect or work with it record by record," Rattenbury said. In general, data exhaust tends to be even bigger, primarily because there are few limits on what a company can collect.

"Google is the leader here," he said. "They literally collect everything, even before they know what they will do with it."

That brings up another interesting feature of data exhaust: It can become primary data once a use for it is found.

3. It has great potential.

Data exhaust can be enormously useful. In that bank example, for instance, knowing where consumers conduct most of their transactions can help the bank do a better job.

"It's not core to the transaction, but it can still be hugely relevant to servicing customers at a better level," Rattenbury said. "It provides a level of understanding and contextualization to that primary transaction or service that's increasingly desired by customers."

Data exhaust can contain important elements of information that you may not be looking for today but that could prove useful in the future, noted Mary Shacklett, president of research firm Transworld Data.

"A lot of exhaust data isn’t immediately valuable," agreed Nik Rouda, senior analyst with Enterprise Strategy Group. "The trick is figuring what is or could be."

4. Beware the 'swamp' -- and the legal baggage.

There can be risks associated with data exhaust.

"This is generally stuff customers may or may not be willing to have given you," Rattenbury explained. "So there are potential legal, marketing, and public-relations risks around leveraging that data. You could end up alienating your customer base or partners by knowing stuff about them that they didn't want you to know."

The implications can be subtle. If an insurance company were to make use of the fact that it can see the GPS location of everywhere you've recently parked your car, for instance, it could raise rates for customers who routinely park in higher-crime areas. Without intending to do so, it might build an algorithm that ends up discriminating racially, he pointed out.

Another potential risk is saving data that will never be useful.

"CIOs need to balance the value of data exhaust against the waste of keeping tons of useless data forever," Shacklett said. "This is very difficult to do right now. "

The goal is to save data exhaust that can go beyond just adding incremental insights and color to being transformative in business activities, Rouda said. "If there isn’t any business reason, this is where data lakes get a bad rap" and become data swamps.

5. You need to make some decisions.

The bottom line is that it's critical to be selective about what data exhaust gets saved.

"It is important to start making some executive decisions on what you are going to throw out," Shacklett said.

For instance, when it comes to smartphones and other devices, it's well-known that much of the associated streaming data is "overhead" from device handshaking and extraneous "log data gibberish," she pointed out. "It is doubtful that this type of data will ever be useful."

Companies should also consult with lawyers, Rattenbury said.

In addition, they should get their employees closest to the core business in touch with the data. "They'll have immediate questions they can ask that will show the relevance right away," he explained.

From a technical perspective, companies need scalable storage technologies as well as tools for self-service data access.

One of the hardest pieces of working with exhaust data is getting single a coherent view around it, Rattenbury said. Cleaning up and unifying that data can be a challenge.

"I might have signed up for service at one place and entered credit-card information at another," he explained. "You've recoded the same piece of data on me from a few different places."

With secondary data, companies don't typically worry at the time of collection about cleaning it up, Rattenbury added. So "you have to realize that it's not just a matter of saying, 'here's this great pile of data -- let's do something with it.'"

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Katherine Noyes

IDG News Service
Show Comments

Cool Tech

Crucial Ballistix Elite 32GB Kit (4 x 8GB) DDR4-3000 UDIMM

Learn more >

Gadgets & Things

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Family Friendly

Lexar® JumpDrive® S57 USB 3.0 flash drive 

Learn more >

Stocking Stuffer

Plox Star Wars Death Star Levitating Bluetooth Speaker

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?