Not all Hadoop clusters use HBase or HDFS. Some integrate with NoSQL data stores that come with their own mechanisms for storing data across a cluster of nodes. This enables them to store and retrieve data with all the features of the NoSQL database and then use Hadoop to schedule data analysis jobs on the same cluster.
Most commonly this means Cassandra, Riak, or MongoDB, and users are actively exploring the best way to integrate the two technologies. 10Gen, one of the main supporters of MongoDB, for instance, suggests that Hadoop can be used for offline analytics while MongoDB can gather statistics from the Web in real time. The illustration at left shows how a connector can migrate data between the two.