Once Hadoop runs on more than a few machines, making sense of the cluster requires order, especially when some of the machines start checking out.
ZooKeeper imposes a file system-like hierarchy on the cluster and stores all of the metadata for the machines so you can synchronize the work of the various machines. (The image at left shows a simple two-tiered cluster.) The documentation shows how to implement many of the standard techniques for data processing, such as producer-consumer queues so the data is chopped, cleaned, sifted, and sorted in the right order. The nodes use ZooKeeper to signal each other when they're done so the others can start up with the data.
For more information, documentation, and the latest builds turn to http://zookeeper.apache.org/.