In Pictures: 18 essential Hadoop tools for crunching big data

Making the most of this powerful MapReduce platform means mastering a vibrant ecosystem of quickly evolving code

In Pictures: 18 essential Hadoop tools for crunching big data prev next

Loading...

Lucene/Solr There is but one tool for indexing large blocks of unstructured text, and it's a natural partner for Hadoop. Written in Java, Lucene integrates easily with Hadoop, creating one big tool for distributed text management. Lucene handles the indexing; Hadoop distributes queries across the cluster.

New Lucene-Hadoop features are rapidly evolving as new projects. Katta, for instance, is a version of Lucene that automatically shards across a cluster. Solr offers more integrated solutions for dynamic clustering with the ability to parse standard file formats like XML. The illustration shows Luke, a GUI for browsing Lucene images. It now sports a plug-in for browsing indices in a Hadoop cluster.

Lucene and many of its descendants are part of the Apache project and available from http://www.apache.org.

Prev Next 12/19

Comments on this image

Close

In Pictures: 18 essential Hadoop tools for crunching big data

19 images
Shopping.com

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?