Databricks takes on Google data streaming analysis with Spark

Databricks Cloud will provide Spark-based streaming analysis as a service

Taking on Google, Databricks plans to offer its own cloud service for analyzing live data streams, one based on the Apache Spark software.

Databricks Cloud is designed to provide a platform for analyzing streaming data, much like the Google DataFlow service announced last week.

Like Google DataFlow, Databricks Cloud promises to offer a single programming model that cuts across different approaches to data analysis, including support for batch programming and live data streaming. And like Google DataFlow, Databricks Cloud will first be offered in preview mode, with full commercial support due by the end of the year.

The two services are aimed to different markets, according to Ion Stoica, CEO of Databricks.

"Google DataFlow is really targeted to developers. We also have higher-level interfaces for data scientists and data engineers," Stoica said.

Databricks also guarantees application portability. Because the entire stack is based on open source software, users can move their workloads to other Apache Spark installations should they need to, Stoica said. "You can take your application and run it in another cloud," Stoica said.

Such a service could be used by enterprises for tasks such as churn analysis, which can determine why a customer stops using a product, or for fraud detection, where a malicious activity can be spotted while it is still taking place.

The University of California, Berkeley's AMP (Algorithms, Machines and People) Lab originally developed Spark as a unified processing engine, one able to provide a platform for a variety of data analysis tasks, including interactive queries, steaming data analysis, machine learning and graph computation.

A number of developers behind Spark went on to form Databricks. The software itself, designed to run on a cluster of servers, is now managed as an open source project under the guidance of the Apache Software Foundation.

Offering Spark as a service eliminates the arduous task for setting up and maintaining an in-house implementation of Spark, Stoica noted.

"Clusters are hard to set up and maintain. To build a data pipeline, you need to stitch together multiple tools, and the tools are still hard to use. So extracting value out of the data is still a struggle," Stoica said.

Initially, Databricks Cloud will be run on Amazon Web Services, though eventually it will also run on other cloud providers such as Google.

In addition to the Spark platform itself, Databricks will provide a set of built-in applications that can do common data analysis tasks. Users can build their own workflows, or issue queries and interact with the data directly. Output can be piped to a dashboard or a report.

Databricks is not the only company making use of Spark's capabilities. ClearStory offers an analytics software package based on Spark that allows organizations to aggregate dozens of unstructured data sources for analysis, far more than can be easily done through traditional business intelligence tools, said ClearStory CEO Sharmila Mulligan.

Databricks also announced Monday that it has received US$33 million in series B funding led by venture capital firm, New Enterprise Associates, with follow-on investment from Andreessen Horowitz.

Join the PC World newsletter!

Error: Please check your email address.

Tags open sourceapplicationsDatabricksdata miningManaged Servicessoftwarecloud computinginternet

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Marc Ferranti

IDG News Service
Show Comments

Cool Tech

ASUS ROG Swift PG279Q – Reign beyond virtual world

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Xiro Drone Xplorer V -3 Axis Gimbal & 1080p Full HD 14MP Camera

Learn more >

D-Link TAIPAN AC3200 Ultra Wi-Fi Modem Router (DSL-4320L)

Learn more >

Crucial® BX200 SATA 2.5” 7mm (with 9.5mm adapter) Internal Solid State Drive

Learn more >

D-Link PowerLine AV2 2000 Gigabit Network Kit

Learn more >

Gadgets & Things


Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Family Friendly

ASUS VivoPC VM62 - Incredibly Powerful, Unbelievably Small

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Stocking Stuffer

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Best Deals on PC World

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.


Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?