There is but one tool for indexing large blocks of unstructured text, and it's a natural partner for Hadoop. Written in Java, Lucene integrates easily with Hadoop, creating one big tool for distributed text management. Lucene handles the indexing; Hadoop distributes queries across the cluster.
New Lucene-Hadoop features are rapidly evolving as new projects. Katta, for instance, is a version of Lucene that automatically shards across a cluster. Solr offers more integrated solutions for dynamic clustering with the ability to parse standard file formats like XML. The illustration shows Luke, a GUI for browsing Lucene images. It now sports a plug-in for browsing indices in a Hadoop cluster.
Lucene and many of its descendants are part of the Apache project and available from http://www.apache.org.