Pig is used to analyze large data sets, featuring parallelization and a high-level language for data analysis algorithms. Developers can use Pig instead of writing Java code when using Hadoop.
“You can think of Pig as an abstraction layer on top of Hadoop,” says Daniel Dai, a committer on the project. Pig is so-named because of its ability to eat everything data-wise, Dai says. “It consumes all kinds of data.”
Users can build their own functions for special-purpose processing. The forthcoming upgrade, Pig 11.0, will feature performance enhancements and operators cube, for calculating multiple dimension aggregates, and rank, for ranking. Pig developers would like Pig to eventually be independent of Hadoop, but right now it is Hadoop-dependent, Dai says.