SAP wants to embrace all your data stores with Data Hub

SAP aims to leave your data where it finds it, centralizing only the processing, not the storage

Credit: Stephen Lawson

If data warehouses are for tidiness freaks (information packaged into neat inferences, sorted and stacked, the rest discarded) and data lakes are for hoarders (tip everything in, you never know what might be useful) then SAP's new Data Hub may be for the rest of us.

It's a new data management tool intended to process only the data you need -- and to go looking for it where it's created or stored, without requiring you to pull it all into one place. 

Data scientists will be able to use it to analyze data from multiple sources and systems.

"Data Hub is a strong data management umbrella layer that allows for data integration, data processing and data governance," said Irfan Khan, global head of SAP database and data management sales.

"It allows us to look across all the data that you own, and access all of the information. But it doesn't look to centralize all this data in a data lake of its own; it's looking at capturing data and accessing data exactly where it resides today," said Khan, speaking ahead of the product's launch Monday.

While the notion of an enterprise data hub has been around for a while, SAP is using the term a little differently from most: Where others such as MapR or Cloudera of importing all the data into a giant Hadoop cluster or other central repository before processing, SAP intends leave data in situ until it's needed.

It will to do that by creating data pipelines -- flows of data that are composed of reusable, configurable operations to process data pulled from a variety of sources, including CSV files, web services APIs, and commercial cloud services, as well as SAP's own data stores. The operations could be connectors to different file systems or APIs, analytics or machine learning libraries such as TensorFlow, or custom-coded tasks.

SAP provides a graphical tool for modeling workflows and pipelines, and an orchestration layer for invoking jobs and restarting or rolling back tasks in the event of failure. This can take the place of workflow scheduling systems such as Apache Oozie, Khan said. 

The execution of the pipeline can be pushed down to other platforms, such as SAP's Vora computing engine, he said.

Data Hub doesn't need a company to built on SAP in order to work: It can also be integrated with third-party products, he said."You don't need to be using SAP's ETL processing, you may be using Informatica," he said, or perhaps the open-source Kafka messaging layer.

SAP Data Hub is now generally available, but how much will it cost? Inevitably, as with most enterprise software, it depends.

Pricing is based on the total systems and computing nodes managed by SAP Data Hub, according to an SAP spokesman. It also requires a license for SAP's in-memory database engine, HANA. Customers with existing HANA licenses can use them, if they have sufficient capacity. Customers without a HANA license can buy a small amount of HANA capacity to ensure that Data Hub's runtime needs are met.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Peter Sayer

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?