Amazon Elastic MapReduce
Based on Hadoop, MapReduce equips users with potent distributed data-processing tools
- Doesn't take long to get the hang of
- Currently available in the US region only
You'll want to be familiar with the Apache Hadoop framework before you jump into Elastic MapReduce. It doesn't take long to get the hang of it, though. Most developers can have a MapReduce application running within a few hours.
These two steps, the map function and the reduce function, comprise what Amazon MapReduce refers to as a "job flow." Admittedly, this is an oversimplification, because job flows involve other configuration parameters (such as where you get the input data and where you put the output), and you can define additional steps in the process, but that's the basic idea.
As a result, a programmer building a Hadoop-powered MapReduce system can focus on the comparatively simple job of crafting the individual functions that process single key/value pairs at a time. Hadoop does the legwork of carving the input data into initial key/value pairs; starting multiple map function instances; feeding them input data; gathering, sorting, and ordering the intermediate key/value pairs; launching reduce instances; feeding them the properly arranged intermediate data; and -- finally -- delivering the output. And all the while, Hadoop monitors the progress map and reduce tasks, as well as restarts "dead" ones automatically. Whuf.
Hadoop in the cloud
To access Amazon's Elastic MapReduce, your first stop is your Amazon Web Services account page (assuming you have an account with AWS), where you must sign up for the Elastic MapReduce service. Then, head on over to the AWS Management Console and log in. You'll find that the AWS Console -- which had been a control panel for Amazon's EC2 only -- displays a new Amazon Elastic MapReduce tab. Click the tab, and you are transferred to the Job Flows page, from which you can monitor the status of current job flows, as well as examine details of previous (terminated) job flows.
To define a new job flow, click the Create New Job Flow button. This sends you through a series of windows in step-by-step fashion. You fill in textboxes to define the location of your input data, where you want your output data, and paths to your map and reduce function. All of these locations must exist in Amazon S3 buckets. In the case of the output data, the location will exist when the job flow concludes. Consequently, it's a good idea to have a utility for transferring data to and from S3 on hand. I recommend the excellent S3Fox Organizer.
Amazon Elastic MapReduce allows for two kinds of job flows: custom jar and streaming. A custom jar-style job flow expects your map and reduce functions to be in compiled Java classes stored in Java JAR files. The Hadoop framework is Java-based, so a custom jar job flow provides the better performance. On the other hand, a streaming-type job flow lets you write your map and reduce functions in non-Java languages such as Python, Ruby, Perl, and others. The functions of a streaming job flow read the input data from stdin, and send the output to stdout. So, data flows in and out of the functions as strings, and -- by convention -- a tab separates the key and value of each input line. Once you've specified the whereabouts of your job flow's components, you identify the quantity and processing power of the EC2 instances on which the job will execute. You can select up to 20 EC2 instances; any more than that, and you have to fill out a special request form. Your choice of compute instances ranges from Small to Extra Large High CPU. Check the Amazon documentation for a complete description of the power of a CPU instance.
Join the PC World newsletter!
Most Popular Reviews
- 1 Huawei P10 smartphone review
- 2 Huawei P10 Plus phone: Full, in-depth review
- 3 Motorola Moto G5 smartphone review
- 4 Oppo A57 phone: full, in-depth review
- 5 Moto G5 Plus phone: full, in-depth review
Latest News Articles
- Microsoft shows the power of its Pen with a new Whiteboard app and other upgrades
- Wanawiki is the WannaCry fix that might save affected PCs—if you work fast
- Microsoft redesigns OneNote UI to make it more universally accessible
- Google's Standalone VR and VPS address the clutter and clumsiness of virtual reality
- The WannaCry ransomware might have a link to North Korea
PCW Evaluation Team
A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.
I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.
As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.
I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.
Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!
For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.
- LG 2017 OLED TV range full review: W7 Signature Wallpaper, G7, E7 and C7 UHD TVs
- Asus ROG Strix Z270F Gaming motherboard review
- The simple RAM buying guide
- What's the difference between an Intel Core i3, i5 and i7?
- Laser vs. inkjet printers: which is better?
- FTSocial Media Executive / Specialist (Facebook) - online gamblingNSW
- FTPractise Manager - SecurityVIC
- FTIT Systems Specialists - Collaboration SystemsNSW
- TPDatawarehouse Test AnalystSA
- CCSenior PMO AnalystNSW
- CCPHP DeveloperNSW
- FTLevel 3 Service Desk Support Engineer / Project ManagerQLD
- FTSplunk Software Developer | 6mth ContractVIC
- CCTransport EngineerVIC
- FTSecurity Engineer (Cisco ASA) - Professional Services - Permanent - Sydney CBDNSW
- FTService Delivery Manager - Telecommunications InfrastructureNSW
- FTProject ManagerNSW
- FTSenior Java DeveloperVIC
- TPICT Customer Support OfficerNSW
- FTSolution ArchitectQLD
- FTSenior System EngineerNSW
- FTSenior Business Analyst l GROUP LIFE INSURANCE l SydneyQLD
- FTSales Client Services Manager (Mid-market)QLD
- CCBusiness AnalystWA
- CCSalesforce Marketing CloudNSW
- FTJunior-Mid Level Implementation CoordinatorQLD
- FTApplication Support EngineerNSW
- FTWeb Developer - 2 PositionsQLD
- FTBusiness Improvement ManagerNSW