Hortonworks previews a Hadoop of the future

Hadoop 2.0 moves beyond batch processing, offering the foundation for interactive queries and real time analysis

The new Apache YARN scheduler replaces MapReduce by offering a more general use resource management framework

The new Apache YARN scheduler replaces MapReduce by offering a more general use resource management framework

Hortonworks has released a preview distribution of the next generation of Apache Hadoop, one that promises to broaden the scope of the kinds of analysis that can be carried out on the data processing platform.

"Hadoop 2.0 is truly a fundamental architecture change, one that makes Hadoop significantly more than just a batch platform," said Arun Murthy, a founder of Hortonworks, and one of the core engineers developing Hadoop. The update "will fuel a whole new wave of innovation," he said.

The Hortonworks Data Platform 2.0 Community Preview contains a number of new components for the Hadoop environment, most notably YARN (Yet Another Resource Negotiator), a successor to Hadoop's MapReduce job scheduler.

Hadoop started as a "single application platform," one primarily built for crawling and indexing Web content, Murthy said. Organizations are now looking to use it for other kinds of jobs, such as interactive querying or analysis of real time streams of data.

YARN improves on MapReduce by expanding the types of jobs that can be done on a Hadoop platform. MapReduce pretty much could only manage batch processing jobs, executing data analysis across any number of nodes and returning the results when it has completed.

In contrast, YARN is a general-purpose resource management framework. It provides a foundation to run nonbatch processing jobs, such as those that run indefinitely on live streams of data, and those that involve interactive queries, in which users interrogate the data on the fly. "You can now have both the batch MapReduce jobs and interactive SQL queries running right next to each other in YARN," Murthy said.

Using YARN, "you have a cluster that is aware of all the different types of workloads and resource needs, so they can all cohabitate. You don't get one workload dominating or taking over all the resources of the cluster," said Shaun Connolly, Hortonworks vice president of corporate strategy for Hortonworks. Previously, organizations would have to run separate clusters to execute different styles of jobs.

HDP 2.0 includes a number of other new components as well, including the Apache Tez, an add-on to YARN for speeding large, interactive jobs, and Stinger, a collection of technologies that provides the ability to run SQL queries against a Hadoop repository.

This preview of HDP 2.0, a full Hadoop distribution, runs in either the Oracle VirtualBox or the VMware virtual environments.

Hortonworks announced HDP 2.0 at the 2013 Hadoop Summit, being held this week in San Jose, California. Also at the conference, Rackspace announced it would offer Hadoop as a service, with analysis tools from Pentaho. Splunk released a new tool, called Hunk to explore Hadoop repositories. Data warehouse systems provider Teradata unveiled new Hadoop appliances. And VMware updated its vSphere virtualization management software to support Hadoop clusters.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags open sourceData managementsoftwareapplicationsdata warehousingdata miningHortonWorks

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Jack Jeffries

MSI GS75

As the Maserati or BMW of laptops, it would fit perfectly in the hands of a professional needing firepower under the hood, sophistication and class on the surface, and gaming prowess (sports mode if you will) in between.

Taylor Carr

MSI PS63

The MSI PS63 is an amazing laptop and I would definitely consider buying one in the future.

Christopher Low

Brother RJ-4230B

This small mobile printer is exactly what I need for invoicing and other jobs such as sending fellow tradesman details or step-by-step instructions that I can easily print off from my phone or the Web.

Aysha Strobbe

Microsoft Office 365/HP Spectre x360

Microsoft Office continues to make a student’s life that little bit easier by offering reliable, easy to use, time-saving functionality, while continuing to develop new features that further enhance what is already a formidable collection of applications

Michael Hargreaves

Microsoft Office 365/Dell XPS 15 2-in-1

I’d recommend a Dell XPS 15 2-in-1 and the new Windows 10 to anyone who needs to get serious work done (before you kick back on your couch with your favourite Netflix show.)

Maryellen Rose George

Brother PT-P750W

It’s useful for office tasks as well as pragmatic labelling of equipment and storage – just don’t get too excited and label everything in sight!

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?