In Pictures: 18 essential Hadoop tools for crunchi...
While many refer to the entire constellation of map and reduce tools as Hadoop, there's still one small pile of code at the center known as Hadoop. The Java-based code synchronizes worker nodes in executing a function on data stored locally. Results from these worker nodes are aggregated and reported. The first step is known as "map"; the second, "reduce."
Hadoop offers a thin abstraction over local data storage and synchronization, allowing programmers to concentrate on writing code for analyzing the data. Hadoop handles the rest. The job is split up and scheduled by Hadoop. Errors or failures are expected, and Hadoop is designed to work around faults by individual machines.
In Pictures: 18 essential Hadoop tools for crunching big data