I’ve been reading Pig Design Patterns lately, you can find it here:
I got interested in Pig since it promises to be an abstraction layer over map reduce in Hadoop. As opposed to Hive, Pig does not try to look or behave like SQL. It’s a completely new DSL to work with data and describe map reduce jobs without the need to write map reduce code.
The book is well structured, one of the best structured books I’ve read. It’s almost impossible to get lost in, each term used in the book is well explained. For instance the author describes precisely what a Pattern stands for in the scope of this book.
After an introduction to what Hadoop is and what Pig is (maybe too long introduction, but you can always skip it), you get to the point where several common patterns are explained in detail. They are common use cases in which the Enterprise user may want to use Pig to solve common tasks about logs processing etc.
The author gives demonstration to be a Big Data veteran, and you truly find a disparate set of use cases, ranging from the basic processing of Apache logs, from the process of JSON and XML data to text processing patterns, common statistical tasks, data cleansing and so on.
A lot of stuff. What I really like about it is that each Pattern preceded by a description of what it exactly is and why you would want to use it.
For each pattern, code example explain you how data is expected to be read in Pig, and how you should save it back to one of the supported storages once you are done with the job. You don’t get (or rarely get) a full Pig code example though. You are only instructed on how to get data inside of Pig and how to save it back. The processing part is up to you. But I don’t find it to be a huge drawback since usually a Pig script is a series of steps for grouping and summing data, the online docs are enough for that.
Summarizing, it’s a tremendously well written book. I was able to get up and running with Pig quickly in AWS EMR, really enjoyed playing with it and will definitely keep studying and go back to this book for reference.