Hortonworks' Hadoop distro debuts

For its first Hadoop release, Hortonworks focused on making the data analysis software easy to deploy and monitor

For the first production release of what will be its flagship Apache Hadoop distribution, Hortonworks has focused on providing a set of tools to help deploy, manage and extend the data analysis platform.

"Hortonworks' goal is to make Hadoop easy to use and consume," said John Kreisa, Hortonworks vice president of marketing.

Version 1 of the Hortonworks Data Platform (HDP), to be released June 15, will be Hortonworks' first production-ready product release. Hortonworks was set up a year ago by Yahoo, along with Benchmark Capital, to provide enterprise support for Hadoop, the large-scale data analysis platform. Yahoo played a pivotal role in the early development of Hadoop.

Hortonworks now competes with a number of other companies also offering support packages, including Cloudera, MapR and IBM. Microsoft has chosen Hortonworks' Hadoop distribution for use on its Azure cloud service, though that service, promised by the end of 2011, has not debuted yet.

Like other commercial Hadoop packages, HDP packages a number of different open-source Hadoop components, including the latest versions of the Pig scripting engine, the Hive data warehousing software and the HBase database.

In addition to these basic components, Hortonworks added a number of additional management and interoperability tools to the package, all of them based on open-source projects as well.

To aid in management, the package includes a customized version of Apache Ambari, a Hadoop monitoring and lifecycle management program. With this software, an administrator can set up a single Hadoop instance across a number of servers. Once Hadoop is installed, the software then monitors performance of the servers as well as the Hadoop jobs themselves, presenting the data on a dashboard.

"The dashboards are customizable and the APIs [application programming interfaces] allow the management and monitoring functionality to be tied into third-party dashboards like Hewlett-Packard's OpenView or Teradata's Viewpoint," Kreisa said.

With this release, the management tools will only be able to manage a single cluster, though future versions may be able to manage multiple clusters, said Ari Zilka, Hortonworks chief products officer. Specific metrics that are being captured include network utilization, throughput and latency, and usage of CPUs, memory and disks. Jobs in Hadoop are also measured, including the time it takes for a task to start, how many tasks there are on backlog, how many data blocks a task uses and where these data blocks are located.

For data interoperability, the package includes a metadata catalogue that should make it easier for business intelligence and other data analysis products to query Hadoop datasets. Based on Apache HCatalog, this metadata repository provides pointers to Hadoop data in a set of tables that can be easily queried by tools commonly used for relational databases, enterprise data warehouses and other structured data systems.

The package also includes a copy of Talend Open Studio, which provides a GUI (graphical user interface) for exploring, querying and applying logical workflows to Hadoop data sets.

Created in 2005 to analyze large amounts of Web traffic logs, Hadoop is increasingly being used for analyzing swaths of unstructured data too large and unwieldy to be crammed into a relational database or enterprise data warehouse -- data often referred to as big data. In survey results released Tuesday by IT consulting company Capgenimi, 58 percent of 600 senior business and IT executives had stated that they plan to invest in big data systems, such as Hadoop, over the next three years.

HDP uses version 1 of the Hadoop software, generally considered the first production-ready version of the software. HDP has been tested in beta for the past seven months.

In addition to announcing this release, Hortonworks also announced that it has teamed with VMware to provide a set of tools to run HDP in high-availability (HA) mode. VMware's vSphere can monitor Hadoop NameNode and JobTracker services. Should one of these services fail, vSphere can redirect operations to live backup services and keep the cluster running.

HDP itself will be available for a free download. Using a payment model similar to Red Hat's, Hortonworks will offer support subscriptions. Pricing is based on a per-cluster basis, starting at US$12,500 per year for 10 nodes.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Brand Post

Bitdefender 2019

Taking cybersecurity to the highest level and order now for a special discount on the world’s most awarded and trusted cybersecurity. Be aware without a care!

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Luke Hill


I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?