Amazon Elastic MapReduce
Based on Hadoop, MapReduce equips users with potent distributed data-processing tools
Have you got a few hundred gigabytes of data that need processing? Perhaps a dump of radio telescope data that could use some combing through by a squad of processors running Fourier transforms? Or maybe you're convinced some statistical analysis will reveal a pattern hidden in several years of stock market information? Unfortunately, you don't happen to have a grid of distributed processors to run your application, much less the time to construct a parallel processing infrastructure.
Pros
- Doesn't take long to get the hang of
Cons
- Currently available in the US region only
Bottom Line
You'll want to be familiar with the Apache Hadoop framework before you jump into Elastic MapReduce. It doesn't take long to get the hang of it, though. Most developers can have a MapReduce application running within a few hours.
-
Price
TBA (AUD)
Well, cheer up: Amazon has added Elastic MapReduce to its growing list of cloud-based Web services. Currently in beta, Elastic MapReduce uses Amazon's Elastic Compute Cloud (EC2) and Simple Storage Service (S3) to implement a virtualized distributed processing system based on Apache Hadoop.
Hadoop's internal architecture is the MapReduce framework. The mechanics of MapReduce are well documented in a paper by J. Dean and S. Ghemawat [PDF], and a full treatment is beyond the scope of this article. Instead, I'll illustrate by example.
Suppose you have a set of 10 words and you want to count the number of times those words appear in a collection of e-books. Your input data is a set of key/value pairs, the value being a line of text from one of the books and the key being the concatenation of the book's name and the line's number. This set might comprise a few megabytes big -- or gigabytes. MapReduce doesn't much care about size.
You write a routine that reads this input, a pair at a time, and produces another key/value pair as output. The output key is a word (from the original set of 10) and the associated value is the number of times that word appears in the line. (Zero values are not emitted.) This routine is the map part of map/reduce. Its output is referred to as the intermediate key/value pairs.
The intermediate key/value pairs are fed to another function (another "step" in the parlance of MapReduce). For this step, you write a routine that iterates through the intermediate data, sums up the values, and returns a single pair whose key is the word and whose value is the grand total. You don't have to worry about grouping the results of like keys (i.e., gathering all the intermediate key/values for a given word), because Hadoop does that grouping for you in the background.
Best Deals on PCWorld
- Mobile PhonesView all »
-
-
Nokia Lumia 900 White
$339.00 -
Nokia Lumia 900 Blue
$342.00 -
Samsung ATIV S Smartphone
$524.00 -
Apple iPhone 4S 64GB
$679.00 -
Samsung Galaxy S Duos S7562 Wh...
$258.00 -
Apple Iphone 4S 32Gb Smartphon...
$637.00 -
HTC Desire C SIM Free / Unlock...
$163.00 -
Samsung Galaxy S III i9300 32G...
$469.97 -
Sony Xperia Miro ST23i SIM Fre...
$269.00 -
Samsung Galaxy Nexus - i9250 (...
$394.00 -
Sony Xperia Z C6603 purple
$546.00 -
Samsung Galaxy Note II N7100 T...
$525.00 -
Nokia Optus Nokia Asha 302 Pre...
$89.00 -
Sony Smartphone/Mobile Phone -...
$191.15 -
HTC Sensation XE Z715e 4GB Int...
$259.99 -
Motorola RAZR Spyder XT910
$310.00
-
- NotebooksView all »
-
-
Dell Laptop Latitude E6530
$2599.00 -
Dell XPS 15 Laptop
$2398.98 -
Dell Inspiron 15R Special Edit...
$1298.99 -
Dell Inspiron 13z Laptop
$899.00 -
Dell XPS 15 Laptop
$1698.98 -
Dell Inspiron 15 Laptop
$598.99 -
Dell XPS 15 Laptop
$2298.99 -
Dell XPS 15 Laptop
$1498.98 -
Dell Laptop Latitude E6330
$1398.99 -
Dell Laptop Latitude E6430
$2599.00 -
Dell Inspiron 17R Special Edit...
$1298.99 -
Dell Inspiron 15 (Touch) Lapto...
$698.99 -
Dell Laptop Latitude E6530
$1349.00 -
Dell XPS 15 Laptop
$1798.98 -
Dell Alienware M17x Laptop
$2199.00 -
Dell Alienware M17x Laptop
$2498.99
-
- TabletsView all »
- Printers & ScannersView all »
-
-
HP Laserjet Pro M1536 25PPM Ne...
$302.48 -
Brother HL-4570CDW COLOUR LASE...
$524.38 -
Brother MFC-990CW Wireless Ink...
$218.00 -
Dell Laptop Latitude 3330 BTX
$999.00 -
HP LaserJet Pro M1536dnf Multi...
$455.21 -
HP LASERJET P3015DN MONO LASER...
$860.50 -
Dell Laptop Latitude 3330 BTX
$899.00 -
HP LaserJet P3015DN Mono Laser...
$1681.68 -
OKI C110 A4 COLOUR LASER PRINT...
$237.90 -
Brother HL-4570CDW Colour Lase...
$532.30 -
Xerox Fuji Xerox DocuPrint M20...
$249.00 -
Brother MFC-990CW 33PPM Networ...
$244.34
-
- Networking, Wireless & VoIPView all »
-
-
Cisco Catalyst 3560E 24 10/100...
$9999.00 -
D-Link Wireless N 150 3G Mobil...
$71.00 -
HP V1900-8G SWITCH
$122.00 -
D-Link DIR-636L - Wireless N30...
$94.71 -
Cisco CON-SNT-74000008
$143.00 -
Netgear GS748TP ProSafe 48-por...
$1210.81 -
Netcomm NF3ADV N900 VOIP Modem...
$290.00 -
Cisco 2960 Switch WS-C2960-24P...
$1030.00 -
Netcomm NP803N Wireless N150 R...
$49.65 -
Netgear WG311 54Mbps Wireless ...
$13.04 -
Cisco 3560 Switch WS-C3560G-24...
$3250.00 -
Cisco (++)IPS Svc; AR 24x7x2 2...
$1559.03 -
D-Link Wireless N 8-Port Route...
$48.00 -
Netcomm NB16WV ADSL2+ WiFi Mod...
$124.99 -
Netgear GS752TXS PROSAFE 48 PO...
$2120.58 -
Netgear GS108T-200AUS, 8-PORT ...
$137.00
-



Be the first to comment.