Hortonworks previews a Hadoop of the future

Hadoop 2.0 moves beyond batch processing, offering the foundation for interactive queries and real time analysis

The new Apache YARN scheduler replaces MapReduce by offering a more general use resource management framework

The new Apache YARN scheduler replaces MapReduce by offering a more general use resource management framework

Hortonworks has released a preview distribution of the next generation of Apache Hadoop, one that promises to broaden the scope of the kinds of analysis that can be carried out on the data processing platform.

"Hadoop 2.0 is truly a fundamental architecture change, one that makes Hadoop significantly more than just a batch platform," said Arun Murthy, a founder of Hortonworks, and one of the core engineers developing Hadoop. The update "will fuel a whole new wave of innovation," he said.

The Hortonworks Data Platform 2.0 Community Preview contains a number of new components for the Hadoop environment, most notably YARN (Yet Another Resource Negotiator), a successor to Hadoop's MapReduce job scheduler.

Hadoop started as a "single application platform," one primarily built for crawling and indexing Web content, Murthy said. Organizations are now looking to use it for other kinds of jobs, such as interactive querying or analysis of real time streams of data.

YARN improves on MapReduce by expanding the types of jobs that can be done on a Hadoop platform. MapReduce pretty much could only manage batch processing jobs, executing data analysis across any number of nodes and returning the results when it has completed.

In contrast, YARN is a general-purpose resource management framework. It provides a foundation to run nonbatch processing jobs, such as those that run indefinitely on live streams of data, and those that involve interactive queries, in which users interrogate the data on the fly. "You can now have both the batch MapReduce jobs and interactive SQL queries running right next to each other in YARN," Murthy said.

Using YARN, "you have a cluster that is aware of all the different types of workloads and resource needs, so they can all cohabitate. You don't get one workload dominating or taking over all the resources of the cluster," said Shaun Connolly, Hortonworks vice president of corporate strategy for Hortonworks. Previously, organizations would have to run separate clusters to execute different styles of jobs.

HDP 2.0 includes a number of other new components as well, including the Apache Tez, an add-on to YARN for speeding large, interactive jobs, and Stinger, a collection of technologies that provides the ability to run SQL queries against a Hadoop repository.

This preview of HDP 2.0, a full Hadoop distribution, runs in either the Oracle VirtualBox or the VMware virtual environments.

Hortonworks announced HDP 2.0 at the 2013 Hadoop Summit, being held this week in San Jose, California. Also at the conference, Rackspace announced it would offer Hadoop as a service, with analysis tools from Pentaho. Splunk released a new tool, called Hunk to explore Hadoop repositories. Data warehouse systems provider Teradata unveiled new Hadoop appliances. And VMware updated its vSphere virtualization management software to support Hadoop clusters.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags open sourceHortonWorksapplicationsdata miningsoftwareData managementdata warehousing

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Matthew Stivala

HP OfficeJet 250 Mobile Printer

The HP OfficeJet 250 Mobile Printer is a great device that fits perfectly into my fast paced and mobile lifestyle. My first impression of the printer itself was how incredibly compact and sleek the device was.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?