In Pictures: 18 essential Hadoop tools for crunching big data

Making the most of this powerful MapReduce platform means mastering a vibrant ecosystem of quickly evolving code

In Pictures: 18 essential Hadoop tools for crunching big data prev next

Loading...

HDFS (Hadoop Distributed File System) The Hadoop Distributed File System offers a basic framework for splitting up data collections between multiple nodes while using replication to recover from node failure. The large files are broken into blocks, and several nodes may hold all of the blocks from a file. The image from the Apache documentation at left shows how blocks are distributed across multiple nodes.

The file system is designed to mix fault tolerance with high throughput. The blocks are loaded to maintain steady streaming and are not usually cached to minimize latency. The default model imagines long jobs processing lots of locally stored data. This fits the project's motto that “moving computation is cheaper than moving data.”

The HDFS is also distributed under the Apache license from http://hadoop.apache.org/.

Prev Next 4/19

Comments on this image

Close

In Pictures: 18 essential Hadoop tools for crunching big data

19 images
Shopping.com

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?