5 things you need to know about data exhaust

Pay close attention or your data lake may turn into a data swamp

Big data is now a familiar term in most of the business world, and companies large and small are scrambling to take advantage of it. Data exhaust, on the other hand, is less widely known, and in some ways it's an evil twin brother. Here are five things you should understand about data exhaust's pros and cons.

1. It's essentially all the big data that isn't core to your business.

The "data exhaust" term has actually been around for more than a decade, and it arose with the new streams of data coming from smartphones, said Tye Rattenbury, director of data science and solutions engineering at Trifacta, which makes software for data preparation. Today, more accessible data tools are bringing exhaust to the fore.

If big data is "primary" data that relates to the core function of your business, data exhaust is secondary data, or everything else that's created along the way, Rattenbury explained.

For instance, a bank would consider primary all the data about debits and credits to its customers' accounts. Secondary data might include information like what percentage of customers' transactions are done at an ATM instead of a physical branch.

There are no standard definitions or schemas for data exhaust, which tends to be raw and unstructured, but in many ways, it's equivalent to the byproducts associated with a company's machines and core online activities. It can include streams coming in from Web browsers, plug-ins, log files, Internet of Things (IoT) devices, and more.

2. It's typically bigger than 'big.'

The term "big data" is itself a relative term, boiling down essentially to "anything that's so large that you couldn't manually inspect or work with it record by record," Rattenbury said. In general, data exhaust tends to be even bigger, primarily because there are few limits on what a company can collect.

"Google is the leader here," he said. "They literally collect everything, even before they know what they will do with it."

That brings up another interesting feature of data exhaust: It can become primary data once a use for it is found.

3. It has great potential.

Data exhaust can be enormously useful. In that bank example, for instance, knowing where consumers conduct most of their transactions can help the bank do a better job.

"It's not core to the transaction, but it can still be hugely relevant to servicing customers at a better level," Rattenbury said. "It provides a level of understanding and contextualization to that primary transaction or service that's increasingly desired by customers."

Data exhaust can contain important elements of information that you may not be looking for today but that could prove useful in the future, noted Mary Shacklett, president of research firm Transworld Data.

"A lot of exhaust data isn’t immediately valuable," agreed Nik Rouda, senior analyst with Enterprise Strategy Group. "The trick is figuring what is or could be."

4. Beware the 'swamp' -- and the legal baggage.

There can be risks associated with data exhaust.

"This is generally stuff customers may or may not be willing to have given you," Rattenbury explained. "So there are potential legal, marketing, and public-relations risks around leveraging that data. You could end up alienating your customer base or partners by knowing stuff about them that they didn't want you to know."

The implications can be subtle. If an insurance company were to make use of the fact that it can see the GPS location of everywhere you've recently parked your car, for instance, it could raise rates for customers who routinely park in higher-crime areas. Without intending to do so, it might build an algorithm that ends up discriminating racially, he pointed out.

Another potential risk is saving data that will never be useful.

"CIOs need to balance the value of data exhaust against the waste of keeping tons of useless data forever," Shacklett said. "This is very difficult to do right now. "

The goal is to save data exhaust that can go beyond just adding incremental insights and color to being transformative in business activities, Rouda said. "If there isn’t any business reason, this is where data lakes get a bad rap" and become data swamps.

5. You need to make some decisions.

The bottom line is that it's critical to be selective about what data exhaust gets saved.

"It is important to start making some executive decisions on what you are going to throw out," Shacklett said.

For instance, when it comes to smartphones and other devices, it's well-known that much of the associated streaming data is "overhead" from device handshaking and extraneous "log data gibberish," she pointed out. "It is doubtful that this type of data will ever be useful."

Companies should also consult with lawyers, Rattenbury said.

In addition, they should get their employees closest to the core business in touch with the data. "They'll have immediate questions they can ask that will show the relevance right away," he explained.

From a technical perspective, companies need scalable storage technologies as well as tools for self-service data access.

One of the hardest pieces of working with exhaust data is getting single a coherent view around it, Rattenbury said. Cleaning up and unifying that data can be a challenge.

"I might have signed up for service at one place and entered credit-card information at another," he explained. "You've recoded the same piece of data on me from a few different places."

With secondary data, companies don't typically worry at the time of collection about cleaning it up, Rattenbury added. So "you have to realize that it's not just a matter of saying, 'here's this great pile of data -- let's do something with it.'"

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Katherine Noyes

IDG News Service
Show Comments

Cool Tech

Toys for Boys

Family Friendly

Stocking Stuffer

SmartLens - Clip on Phone Camera Lens Set of 3

Learn more >

Christmas Gift Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Aysha Strobbe

Microsoft Office 365/HP Spectre x360

Microsoft Office continues to make a student’s life that little bit easier by offering reliable, easy to use, time-saving functionality, while continuing to develop new features that further enhance what is already a formidable collection of applications

Michael Hargreaves

Microsoft Office 365/Dell XPS 15 2-in-1

I’d recommend a Dell XPS 15 2-in-1 and the new Windows 10 to anyone who needs to get serious work done (before you kick back on your couch with your favourite Netflix show.)

Maryellen Rose George

Brother PT-P750W

It’s useful for office tasks as well as pragmatic labelling of equipment and storage – just don’t get too excited and label everything in sight!

Cathy Giles

Brother MFC-L8900CDW

The Brother MFC-L8900CDW is an absolute stand out. I struggle to fault it.

Luke Hill


I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?