Once data is stored in nodes in a way Hadoop can find it, the fun begins. Apache's Pig plows through the data, running code written in its own language, called Pig Latin, filled with abstractions for handling the data. This structure steers users toward algorithms that are easy to run in parallel across the cluster.
Pig comes with standard functions for common tasks like averaging data, working with dates, or finding differences between strings. When those aren't enough -- as they often aren't -- you can write your own functions. The image at left shows one elaborate example from Apache's documentation of how you can mix your own code with Pig's to mine the data.
The latest version can be found at http://pig.apache.org.