Big data gets a new open-source project: Apache Arrow

It offers performance improvements of more than 100x on analytical workloads, the foundation says

Hadoop, Spark and Kafka have already had a defining influence on the world of big data, and now there's yet another Apache project with the potential to shape the landscape even further: Apache Arrow.

The Apache Software Foundation on Wednesday launched Arrow as a top-level project designed to provide a high-performance data layer for columnar in-memory analytics across disparate systems.

Based on code from the related Apache Drill project, Apache Arrow can bring benefits including performance improvements of more than 100x on analytical workloads, the foundation said. In general, it enables multi-system workloads by eliminating cross-system communication overhead.

Code committers to the project include developers from other Apache big-data projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu, Parquet, Phoenix, Spark and Storm.

"The open-source community has joined forces on Apache Arrow," said Jacques Nadeau, vice president of the new project as well as Apache Drill. "We anticipate the majority of the world's data will be processed through Arrow within the next few years."

In many workloads, between 70 percent and 80 percent of CPU cycles are spent serializing and deserializing data. Arrow alleviates that burden by enabling data to be shared among systems and processed with no serialization, deserialization or memory copies, the foundation said.

"An industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead," said Ted Dunning, vice president of the Apache Incubator and member of the Apache Arrow Project Management Committee.

Arrow also supports complex data with dynamic schemas in addition to traditional relational data. For instance, it can handle JSON data, which is commonly used in Internet-of-Things (IoT) workloads, modern applications and log files. Implementations are also available for a number of programming languages for greater interoperability.

Apache Arrow software is available under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Katherine Noyes

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?