Cloudera positions Hadoop as an enterprise data hub

Hadoop can now serve as the single source of data for an enterprise, Cloudera says

Taking note of how customers have been working with its Hadoop distribution, Cloudera has expanded the scope of its software so that it can serve as a hub for all of an organization's data, not just data undergoing Hadoop MapReduce analysis.

Some of Cloudera's enterprise customers have "started to use our platform in a new way, as the center of their data centers," said Mike Olson, Cloudera's chairman and chief strategy officer.

"We think this is a very big deal. It will change the way the industry thinks about data," Olson said.

Cloudera has released a new beta of its commercial distribution, Cloudera Enterprise, that provides tools for managing an organization's data, as well as tools from Cloudera and third parties for data analysis.

Olson announced the beta of Cloudera Enterprise 5 at the O'Reilly Strata-Hadoop World conference, being held this week in New York.

"It used to be that an organization had lots of balkanized data silos," Olson said. "The stuff that you used to run on a data warehouse because you had no choice, now you can run on the hub."

Putting the data in a Hadoop-based storage repository has many advantages, Olson argued. You can run different types of analytical workloads against the data in the hub. It can easily feed data to other systems, such as content management systems. It can work as an archiving system.

An enterprise data hub, Olson said, can store data as it is generated, even if the organization isn't sure how the data will be needed. Such data may be valuable later for machine learning analysis or other uses not considered.

An enterprise hub also puts security and governance mechanisms in place to safeguard the data. Cloudera has been working on these tools for several releases, Olson said.

"Our ambition is to draw more workloads in and make the hub more valuable over time," he said.

Part of Hadoop's newfound ability to act as a data hub comes from software additions in the latest version of the open-source software, Apache Hadoop 2, on which Cloudera Enterprise is built.

The inclusion of YARN (Yet Another Resource Manager), for instance, allows Hadoop to handle multiple analysis applications, not just those that run on the batch process-oriented MapReduce.

To facilitate the hub, Cloudera has also set up a management framework that third-party analysis applications can plug into. SAS, Revolution Analytics, Syncsort and other organizations have ported some of their software to the platform. Porting analysis software requires that the operations be executed in parallel, as data in Hadoop is typically distributed across multiple nodes, Olson said.

Cloudera Enterprise 5 also adds the ability to cache HDFS (Hadoop Distributed File System) contents in the working memory of a server, which can boost query response and data processing times.

The company's Navigator auditor tool now allows analysts and data modelers to search, explore, define and tag datasets. Users can add customized queries to Cloudera's Impala SQL engine. And Cloudera Enterprise 5 can work with the NFS (Network File System) nodes, which should make the process of injecting data into HDFS much easier, Olson said.

The software also now can take snapshots of the data, providing a backup if the original data is lost or destroyed.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags softwarecloudera

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Breitling Superocean Heritage Chronographe 44

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?