SAP wants to embrace all your data stores with Data Hub

SAP aims to leave your data where it finds it, centralizing only the processing, not the storage

Credit: Stephen Lawson

If data warehouses are for tidiness freaks (information packaged into neat inferences, sorted and stacked, the rest discarded) and data lakes are for hoarders (tip everything in, you never know what might be useful) then SAP's new Data Hub may be for the rest of us.

It's a new data management tool intended to process only the data you need -- and to go looking for it where it's created or stored, without requiring you to pull it all into one place. 

Data scientists will be able to use it to analyze data from multiple sources and systems.

"Data Hub is a strong data management umbrella layer that allows for data integration, data processing and data governance," said Irfan Khan, global head of SAP database and data management sales.

"It allows us to look across all the data that you own, and access all of the information. But it doesn't look to centralize all this data in a data lake of its own; it's looking at capturing data and accessing data exactly where it resides today," said Khan, speaking ahead of the product's launch Monday.

While the notion of an enterprise data hub has been around for a while, SAP is using the term a little differently from most: Where others such as MapR or Cloudera of importing all the data into a giant Hadoop cluster or other central repository before processing, SAP intends leave data in situ until it's needed.

It will to do that by creating data pipelines -- flows of data that are composed of reusable, configurable operations to process data pulled from a variety of sources, including CSV files, web services APIs, and commercial cloud services, as well as SAP's own data stores. The operations could be connectors to different file systems or APIs, analytics or machine learning libraries such as TensorFlow, or custom-coded tasks.

SAP provides a graphical tool for modeling workflows and pipelines, and an orchestration layer for invoking jobs and restarting or rolling back tasks in the event of failure. This can take the place of workflow scheduling systems such as Apache Oozie, Khan said. 

The execution of the pipeline can be pushed down to other platforms, such as SAP's Vora computing engine, he said.

Data Hub doesn't need a company to built on SAP in order to work: It can also be integrated with third-party products, he said."You don't need to be using SAP's ETL processing, you may be using Informatica," he said, or perhaps the open-source Kafka messaging layer.

SAP Data Hub is now generally available, but how much will it cost? Inevitably, as with most enterprise software, it depends.

Pricing is based on the total systems and computing nodes managed by SAP Data Hub, according to an SAP spokesman. It also requires a license for SAP's in-memory database engine, HANA. Customers with existing HANA licenses can use them, if they have sufficient capacity. Customers without a HANA license can buy a small amount of HANA capacity to ensure that Data Hub's runtime needs are met.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Peter Sayer

IDG News Service
Show Comments



Sansai 6-Outlet Power Board + 4-Port USB Charging Station

Learn more >



Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?