Is Trump's unexpected victory a failure for big data? Not really

Polls go by the old IT maxim: 'Garbage in, garbage out'

Most election prediction shops and public polls in recent days foresaw Republican Donald Trump losing the U.S. presidential race to Democrat Hillary Clinton.

They got it wrong, bigly. And the failed predictions could cast doubts on some hot technology sectors, including big data and customer relationship management.

Not so fast, say some data experts. The problem with the polls and with forecasters like FiveThirtyEight may have more to do with data collection than data crunching, they say.

Data analysis worked well in the Moneyball model for the Oakland Athletics, but baseball stats are different than election polling, said CRM analyst Denis Pombriant, founder of Beagle Research Group. Statisticians have been collecting "highly reliable" baseball data for more than a century, while polling data is more squishy, he said.

All data analysis depends "on good, clean, and valid data, and for that reason, should only be applied in its pure form outside of baseball with care," he wrote in a research note Wednesday. "Without care, you revert to the old IT maxim of GIGO, or 'garbage in, garbage out.'"

Polls aren't really big data, added Nik Rouda, senior analyst at the Enterprise Strategy Group. "The sample sizes were certainly good enough for a poll, but maybe didn’t meet the definitions around volumes of data, variety of data, [and] historical depth contrasted against real-time immediacy, machine learning, and other advanced analytics," he said by email. "If anything, I’d argue that more application of big data techniques would have given a better forecast."

While both presidential teams relied on voter profiles, more robust profiles and "analyzing cohorts by behavior would have shown a clear picture," Rouda said. "This was a failure of the traditional approach, not a failure of math or failure of big data."

There may be more room for prediction shops to combine polling with social media, he suggested. Number crunchers could have looked more at subtle financial indicators, he said.

It's worth noting that most national polls had Clinton winning by only a small margin, with some polls within the margin of error. As of Wednesday morning, Clinton was winning the popular vote by about 185,000 votes in an election in which more than 118.5 million votes were cast. But Clinton lost at least 27 states, including several states the polls predicted she'd win, and will lose in the Electoral College.

The incorrect forecasts, however, don't appear to be a problem with the margin of error, said Professor Samuel Wang, manager of the Princeton Election Consortium, which gave Clinton a 99 percent chance of winning as of Tuesday morning.

The polling resulted in "a systematic error," Wang said by email. "The entire group of polls was off, as a group. This was a really large error, around 4 points at presidential and Senate levels, up and down the ticket."

Wang said he's still evaluating the results. One reason for the incorrect predictions may have been late decisions by so-called undecided voters.

"Pollsters use the word 'undecided,' but what that really means is that they are unable to verbalize their preference -- they may not even know what it is themselves," he said.

Early in the campaign, about 20 percent of Republican-leaning voters were undecided, Wang said. "For them, voting for Trump was a battle between party loyalty and reaction to a fairly radical candidate," he added. "It may be that party loyalty has won."

Wang and Tony Baer, a big data analyst at Ovum, both suggested that polls may have undercounted hard-to-reach voters. With many U.S. residents dropping their landlines, it's also more difficult for pollsters to target the right people, Baer said.

In other cases, people responding to the polls may have lied. Pollsters may have gotten "the wrong signals," Baer added by email.

"When you have data sets that are large enough, you can find signals for just about anything," he added. "So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets."

Asked for his reaction to Trump's victory, Baer said he was "just as perplexed as anybody."

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Grant Gross

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?