Google claims MapReduce sets data-sorting record

Google late last week claimed results of in-house data sorting tests bolster its claims that its MapReduce technology can manipulate more data faster than any conventional database.

According to a Friday afternoon blog post by Grzegorz Czajkowski, a member of Google's systems infrastructure team, MapReduce recently sorted 1 terabyte (TB) of data in 68 seconds, or about a third of the time Yahoo! achieved this year.

Sorting, or rearranging, data is one of the most basic functions of a spreadsheet, database or other data manipulation software.

Google used 1,000 servers running MapReduce in parallel to sort the data, versus 910 for Yahoo, according to Czajowksi.

Google also tested MapReduce's ability to sort 1 petabyte (PB), or 1,000 TB, of data. That is equivalent to 12 times the amount of archived Web data in the US Library of Congress as of May 2008, according to Google.

Using 4,000 servers, which is likely a small fraction of Google's entire worldwide server infrastructure, MapReduce took 6 hours and two minutes to sort 1 PB, according to Czajkowski.

"We're not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly," he wrote.

Czajkowski did not say when the tests were done. He did reveal that as of early January this year, Google was processing an average of 20 PB total per day.

By comparison the largest publicly-known data warehouses today store several petabytes of data total, only processing a tiny fraction of that amount each day.

Google's announcement appeared to be deliberately timed to coincide with a speech by a noted database expert and MapReduce critic, David DeWitt.

A former longtime University of Wisconsin-Madison computer science professor, DeWitt joined Microsoft this spring to run a new research lab being created on the Madison campus.

The lab will focus on helping Microsoft's SQL Server "scale out" in order to run on hundreds or thousands of servers at a time. That will allow customers to run parallel database clusters similar technically to Google's, though nowhere near the latter's scale.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags mapreduce

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Eric Lai

Computerworld
Show Comments

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?