Apache Tajo brings data warehousing to Hadoop

A relatively obscure open source Apache package provides a way to put Hadoop data in a data warehouse for further analysis

Tajo provides the ability to set up a distributed data warehouse accross multiple servers

Tajo provides the ability to set up a distributed data warehouse accross multiple servers

Organizations that want to extract more intelligence from their Hadoop deployments might find help from the relatively little known Tajo open source data warehouse software, which the Apache Software Foundation has pronounced as ready for commercial use.

The new version of Tajo, Apache software for running a data warehouse over Hadoop data sets, has been updated to provide greater connectivity to Java programs and third party databases such as Oracle and PostGreSQL.

While less well-known than other Apache big data projects such as Spark or Hive, Tajo could be a good fit for organizations outgrowing their commercial data warehouses. It could also be a good fit for companies wishing to analyze large sets of data stored on Hadoop data processing platforms using familiar commercial business intelligence tools instead of Hadoop's MapReduce framework.

Tajo performs the necessary ETL (extract-transform-load process) operations to summarize large data sets stored on an HDFS (Hadoop Distributed File System). Users and external programs can then query the data through SQL.

The latest version of the software, issued Monday, comes with a newly improved JDBC (Java Database Connectivity) driver that its project managers say makes Tajo as easy to use as a standard relational database management system. The driver has been tested against a variety of commercial business intelligence software packages and other SQL-based tools.

Other new features include catalogs of built-in SQL commands from both Oracle and PostgreSQL systems.

Like a growing number of database systems, Tajo now features full support for JSON (JavaScript Object Notation), easing the process for Web developers to work with Tajo. Tajo can also work directly with Amazon S3 (Simple Storage Service)

Gruter, a big data infrastructure startup in South Korea, is leading the charge to develop Tajo. Engineers from Intel, Etsy, NASA, Cloudera and Hortonworks also contribute to the project.

Perhaps because of its South Korean home base, the software is not very widely known elsewhere in the world, compared to other open-source SQL-based Hadoop packages such as Hive or Impala.

At least in one test of the software, conducted in 2013, Tajo appeared to possess a speed advantage, according to Gruter. Korea's SK Telecom telecommunications firm ran Tajo against 1.7 terabytes worth of data, and found it could complete queries with greater speed than either Hive or Impala, in most instances.

As with most benchmarks, results may vary according to the specific workload. New editions of Hive and Impala may have also closed the speed gap as well.

SK Telecom uses the software in production duties, as does Korea University and NASA's Jet Propulsion Laboratory. The Korean music streaming service Melon uses the software for analytical processing, and has found that Tajo executes ETL jobs 1.5 to 10 times faster than Hive.

The Apache Software Foundation provides support and oversight for more than 350 open source projects, including Hadoop, the Cassandra NoSQL database and the Apache HTTP server.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the PC World newsletter!

Error: Please check your email address.

Tags applicationsdata miningsoftwareApache Software Foundation

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?