Big data has potential but requires care

Both data and tools to manage it are growing, but taking advantage of it requires planning

The proliferation of large-scale data sets is just beginning to change business and science around the world, but enterprises need to prepare in order to gain the most advantage from their information, panelists said at a Silicon Valley event this week.

So-called "big data" is both a challenge to manage and a tool for competitive advantage, according to speakers at a Churchill Club event on Wednesday night in Mountain View, California. The discussion at the Computer History Museum followed the launch of EMC Greenplum's Unified Analytics Platform, which lets business and IT staffs analyze both structured and unstructured data.

New networked devices and applications are collecting more data than ever and more organizations are holding on to it, creating huge demands for storage. In the second quarter of this year, storage companies shipped 5,429 petabytes of disk capacity, up 30.7 percent from last year's second quarter, IDC reported last week.

"Data growth is already faster than both Moore's Law and ... network growth," said Anand Rajaraman, senior vice president of Walmart Global E-Commerce and head of @WalmartLabs. His lab has developed tools for Walmart to take advantage of the new types of data being generated, including applications that collect and analyze information from sources such as Twitter and Facebook to gauge trends and individual consumer preferences.

The benefits of big data stretch beyond business to earth sciences, biology, psychology and other fields, Rajaraman said.

"Science has become more and more about collecting large amounts of data and doing analysis," he said.

Big data can be any volume of data that requires new tools to analyze, said Luke Lonergan, chief technology officer and co-founder of Greenplum, which EMC acquired last year. For example, it would take 27 hours to run a logistic regression algorithm, which can be used to predict the probability of an event, on 30G bytes of data, Lonergan said. If run on 32 computers, the process takes 60 seconds, he said.

"'Bigger than previous-generation, non-parallel infrastructure could handle' might be a useful definition. Anything that blows you out of the old way of doing things," Lonergan said.

Analyzing data also has gotten harder not only because there is more of it but because it comes from new sources, panelists said. Blogs, Web comments and other information comes in the form of unstructured data, which can't be crunched the way relational databases are. The need to mine different types of content has led to new data analysis platforms, most notably the open-source Hadoop framework that was pioneered by Google and Facebook.

The market for new tools to manage and exploit big data is still growing, said Ping Li, who heads the Big Data Fund at venture capital company Accel Partners.

"A lot of the applications that ride on top of these new data platforms have yet to be invented," Li said. Traditional business intelligence and ERP (enterprise resource planning) platforms are being adapted to deal with big data, but what's needed are native applications developed specifically for the new world, he said.

Developing countries are active participants in this process, sometimes because companies there have skipped over legacy systems that are ingrained in first-world enterprises, Li said.

Trying to get value out of big data today is like creating an online store in the early days of e-commerce, said Walmart's Rajaraman, who helped develop's marketplace business. Amazon had to invent its own systems for payment, fraud detection and other tasks, each of which later spawned independent vendors that specialize in each area, he said.

It's important for an enterprise to understand the implications of big data and how the new tools work before embarking on a big-data initiative, panelists warned.

"Those who are just standing up Hadoop as is, with no management framework, writing directly to it ... there's going to be some real disillusionment there," said Keith Collins, senior vice president and chief technology officer of SAS.

Big-data tools such as Hadoop can't create value out of information by themselves, Collins warned in an interview at the event. Enterprises have to know what they want to find out from their data and then deal with how to get that out of their data. "The data issues come after the question," he said.

Stephen Lawson covers mobile, storage and networking technologies for The IDG News Service. Follow Stephen on Twitter at @sdlawsonmedia. Stephen's e-mail address is

Join the PC World newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags applicationsstoragesoftwareData managementemcbusiness intelligence

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Stephen Lawson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Armand Abogado

HP OfficeJet 250 Mobile Printer

Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?