Amazon Elastic MapReduce
Based on Hadoop, MapReduce equips users with potent distributed data-processing tools
- Doesn't take long to get the hang of
- Currently available in the US region only
You'll want to be familiar with the Apache Hadoop framework before you jump into Elastic MapReduce. It doesn't take long to get the hang of it, though. Most developers can have a MapReduce application running within a few hours.
These two steps, the map function and the reduce function, comprise what Amazon MapReduce refers to as a "job flow." Admittedly, this is an oversimplification, because job flows involve other configuration parameters (such as where you get the input data and where you put the output), and you can define additional steps in the process, but that's the basic idea.
As a result, a programmer building a Hadoop-powered MapReduce system can focus on the comparatively simple job of crafting the individual functions that process single key/value pairs at a time. Hadoop does the legwork of carving the input data into initial key/value pairs; starting multiple map function instances; feeding them input data; gathering, sorting, and ordering the intermediate key/value pairs; launching reduce instances; feeding them the properly arranged intermediate data; and -- finally -- delivering the output. And all the while, Hadoop monitors the progress map and reduce tasks, as well as restarts "dead" ones automatically. Whuf.
Hadoop in the cloud
To access Amazon's Elastic MapReduce, your first stop is your Amazon Web Services account page (assuming you have an account with AWS), where you must sign up for the Elastic MapReduce service. Then, head on over to the AWS Management Console and log in. You'll find that the AWS Console -- which had been a control panel for Amazon's EC2 only -- displays a new Amazon Elastic MapReduce tab. Click the tab, and you are transferred to the Job Flows page, from which you can monitor the status of current job flows, as well as examine details of previous (terminated) job flows.
To define a new job flow, click the Create New Job Flow button. This sends you through a series of windows in step-by-step fashion. You fill in textboxes to define the location of your input data, where you want your output data, and paths to your map and reduce function. All of these locations must exist in Amazon S3 buckets. In the case of the output data, the location will exist when the job flow concludes. Consequently, it's a good idea to have a utility for transferring data to and from S3 on hand. I recommend the excellent S3Fox Organizer.
Amazon Elastic MapReduce allows for two kinds of job flows: custom jar and streaming. A custom jar-style job flow expects your map and reduce functions to be in compiled Java classes stored in Java JAR files. The Hadoop framework is Java-based, so a custom jar job flow provides the better performance. On the other hand, a streaming-type job flow lets you write your map and reduce functions in non-Java languages such as Python, Ruby, Perl, and others. The functions of a streaming job flow read the input data from stdin, and send the output to stdout. So, data flows in and out of the functions as strings, and -- by convention -- a tab separates the key and value of each input line. Once you've specified the whereabouts of your job flow's components, you identify the quantity and processing power of the EC2 instances on which the job will execute. You can select up to 20 EC2 instances; any more than that, and you have to fill out a special request form. Your choice of compute instances ranges from Small to Extra Large High CPU. Check the Amazon documentation for a complete description of the power of a CPU instance.
Join the PC World newsletter!
Most Popular Reviews
- 1 HTC U11 phone: Full, in-depth review
- 2 Gigabyte Aero 15 corporate gaming laptop review
- 3 Huawei P10 smartphone review
- 4 Huawei P10 Plus phone: Full, in-depth review
- 5 Motorola Moto G5 smartphone review
Latest News Articles
- US says laptop ban may expand to more airports
- Epson launches new high-speed Enterprise inkjet printer
- HP's Spectre x360 puts Kaby Lake and Thunderbolt into a thinner, faster package
- HP upgrades the Envy 13 laptop with Kaby Lake, debuts the 4K Envy 27 display
- Apple to replace defective USB-C cables that shipped with some 12-inch MacBooks
PCW Evaluation Team
The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.
Wireless printing from my iPhone was also a handy feature, the whole experience was quick and seamless with no setup requirements - accessed through the default iOS printing menu options.
A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.
I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.
As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.
I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.
- MSI GL62M 7RDX gaming laptop review
- Alcatel A3 XL phone: Full, in-depth review
- Sony X9300E 2017 TV: Full, in-depth review
- What's the difference between an Intel Core i3, i5 and i7?
- Laser vs. inkjet printers: which is better?
- CCSystems Specialist - Linux / Windows / Network l Port MacquarieQLD
- FTSystem EngineerNSW
- FTC# DeveloperOther
- CCInfrastructure Engineer - Financial Services - Contract - Sydney CBDNSW
- FTProject Engineer (Rail/Control Signals) - 168342/ 168335 AROther
- FTSenior Siebel Integrator/Developer - Canberra/MelbourneOther
- FTDesktop EngineerOther
- FTCloud Project ManagerOther
- CCSenior Drupal DeveloperNSW
- FTSupport AnalystOther
- FTService Delivery CoordinatorOther
- CCSenior Teradata Developer/Analyst ProgrammerNSW
- FTService Centre ManagerQLD
- FTFront End Developer (AEM / Java)Other
- CCContracts AnalystVIC
- FTProject Coordinator - DigitalOther
- TPSenior Communications ManagerACT
- CCSenior Drupal DeveloperNSW
- CCSystems TesterNSW
- CCBusiness AnalystNSW
- FTTeam Leader Solution DeliveryQLD
- CCCommunications AnalystQLD
- CCJunior Change AnalystNSW
- FTFront End Developer | 6- 12mths ContractOther
- FTProgram CoordinatorOther