Amazon Elastic MapReduce
Based on Hadoop, MapReduce equips users with potent distributed data-processing tools
- Doesn't take long to get the hang of
- Currently available in the US region only
You'll want to be familiar with the Apache Hadoop framework before you jump into Elastic MapReduce. It doesn't take long to get the hang of it, though. Most developers can have a MapReduce application running within a few hours.
These two steps, the map function and the reduce function, comprise what Amazon MapReduce refers to as a "job flow." Admittedly, this is an oversimplification, because job flows involve other configuration parameters (such as where you get the input data and where you put the output), and you can define additional steps in the process, but that's the basic idea.
As a result, a programmer building a Hadoop-powered MapReduce system can focus on the comparatively simple job of crafting the individual functions that process single key/value pairs at a time. Hadoop does the legwork of carving the input data into initial key/value pairs; starting multiple map function instances; feeding them input data; gathering, sorting, and ordering the intermediate key/value pairs; launching reduce instances; feeding them the properly arranged intermediate data; and -- finally -- delivering the output. And all the while, Hadoop monitors the progress map and reduce tasks, as well as restarts "dead" ones automatically. Whuf.
Hadoop in the cloud
To access Amazon's Elastic MapReduce, your first stop is your Amazon Web Services account page (assuming you have an account with AWS), where you must sign up for the Elastic MapReduce service. Then, head on over to the AWS Management Console and log in. You'll find that the AWS Console -- which had been a control panel for Amazon's EC2 only -- displays a new Amazon Elastic MapReduce tab. Click the tab, and you are transferred to the Job Flows page, from which you can monitor the status of current job flows, as well as examine details of previous (terminated) job flows.
To define a new job flow, click the Create New Job Flow button. This sends you through a series of windows in step-by-step fashion. You fill in textboxes to define the location of your input data, where you want your output data, and paths to your map and reduce function. All of these locations must exist in Amazon S3 buckets. In the case of the output data, the location will exist when the job flow concludes. Consequently, it's a good idea to have a utility for transferring data to and from S3 on hand. I recommend the excellent S3Fox Organizer.
Amazon Elastic MapReduce allows for two kinds of job flows: custom jar and streaming. A custom jar-style job flow expects your map and reduce functions to be in compiled Java classes stored in Java JAR files. The Hadoop framework is Java-based, so a custom jar job flow provides the better performance. On the other hand, a streaming-type job flow lets you write your map and reduce functions in non-Java languages such as Python, Ruby, Perl, and others. The functions of a streaming job flow read the input data from stdin, and send the output to stdout. So, data flows in and out of the functions as strings, and -- by convention -- a tab separates the key and value of each input line. Once you've specified the whereabouts of your job flow's components, you identify the quantity and processing power of the EC2 instances on which the job will execute. You can select up to 20 EC2 instances; any more than that, and you have to fill out a special request form. Your choice of compute instances ranges from Small to Extra Large High CPU. Check the Amazon documentation for a complete description of the power of a CPU instance.
Join the PC World newsletter!
Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop
Linksys AC5400 MU-MIMO Gigabit router
Samsung portable 1TB T3 drive
Epson EcoTank Expression ET-2500
Lexar® JumpDrive® S57 USB 3.0 flash drive
Everki ContemPRO Roll Top Laptop Backpack
UE Boom 2 Bluetooth speaker
Logitech G403 Prodigy mouse
Lexar® JumpDrive® S45 USB 3.0 flash drive
Google Daydream VR headset
Belkin MIXIT Metallic Lightning to USB Cable
Huawei Mate 9
Acer Swift 7
3SIXT Ultra HD Sports Action Camera
Lexar® Portable SSD
HD Pan/Tilt Wi-Fi Camera with Night Vision NC450
Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards
Lexar® JumpDrive® C20c USB Type-C flash drive
HP Pavilion x360 13”
Blade 28 backpack by Arc’teryx
Dell Inspiron 5000 series 2-in-1
Surface Pro 4
Audio-Technica ATH-ANC70 Noise Cancelling Headphones
Garmin Fenix Chronos smartwatch
Dell XPS 13 laptop
Most Popular Reviews
- 1 Huawei Mate 9 full in-depth smartphone review
- 2 ZTE Axon 7 review: Is ZTE dumping old stock on Australia?
- 3 Oppo R9s smartphone full review
- 4 Finally! LG OLED TV 2016 range review
- 5 Huawei Nova Plus smartphone review
Latest News Articles
- Israeli soldiers hit in cyberespionage campaign using Android malware
- Researcher develops ransomware attack that targets water supply
- Analysts peer into Microsoft's rumored Windows 10 Cloud
- AT&T, IBM, Nokia join to make IoT systems safer
- Apple's Plus plan pays off
GGG Evaluation Team
I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.
First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.
For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.
The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.
The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.
- How to quit Pokemon Go (or to start enjoying it again)
- Huawei Mate 9 full in-depth smartphone review
- Time to ditch Foxtel and the iQ3: How to replace Foxtel packages with cheaper alternatives
- What's the difference between an Intel Core i3, i5 and i7?
- Laser vs. inkjet printers: which is better?
- FTOnline Solutions AnalystNSW
- TPSolution Architect - IntegrationQLD
- CCDigital Communications ManagerNSW
- CCSenior .NET DeveloperNSW
- CCWindows AdministratorACT
- CCMDM Consultant/DesignerVIC
- CCSQL Database Administrator (DBA)NSW
- TPBusiness Intelligence Program ManagerVIC
- TPHRIS Business AnalystQLD
- TPProject OfficerQLD
- FTData Conversion LeadNSW
- TPTechnical ConsultantNSW
- FTMicrosoft ConsultantVIC
- FTMonitoring Tools Support l NimSoft , SMARTS, ehealth, TivoliNSW
- CCMDM Consultant/DesignerVIC
- FTDynamics AX Functional ConsultantVIC
- FTApplication Support Analyst/DeveloperNSW
- TPGIS Developer - 6 month ContractQLD
- CCService Desk Quality Assurance AnalystNSW
- FTDynamics AX Functional Consultant (Supply Chain Modules)NSW
- TPService Desk Analyst - Level 1VIC
- FTMicrosoft Dynamics AX Technical ArchitectACT
- FTDynamics AX Functional Consultant (Manufacturing and Trade & Logistics Modules)ACT
- FTSenior Project Manager - PERMANENTACT
- CCUser ResearcherNSW