Getting the treasure trove of the data stored in SQL databases into Hadoop requires a bit of massaging and manipulating. Sqoop moves large tables full of information out of the traditional databases and into the control of tools like Hive or HBase.
Sqoop is a command-line tool that controls the mapping between the tables and the data storage layer, translating the tables into a configurable combination for HDFS, HBase, or Hive. The image from the Apache literature at left shows Sqoop living in between the traditional repositories and the Hadoop structures living on the node.
The latest stable version is 1.4.4, but version 2.0 is progressing well. Both are available from http://sqoop.apache.org/ under the Apache license.