Hadoop & Mapreduce Tutorial | Grunt and Pig Scripts

Grunt and Pig Scripts

Grunt

Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and provides a shell for users to interact with HDFS. To enter Grunt, invoke Pig with no script or command to run. Typing: pig -x local will result in the prompt: grunt>

This gives you a Grunt shell to interact with your local filesystem. If you omit the -x local and have a cluster configuration set in PIG_CLASSPATH, this will put you in a Grunt shell that will interact with HDFS on your cluster.

As you would expect with a shell, Grunt provides command-line history and editing, as well as Tab completion. It does not provide filename completion via the Tab key. That is, if you type kil and then press the Tab key, it will complete the command as kill. But if you have a file foo in your local directory and type ls fo, and then hit Tab, it will not complete it as ls foo. This is because the response time from HDFS to connect and find whether the file exists is too slow to be useful.

Although Grunt is a useful shell, remember that it is not a full-featured shell. It does not provide a number of commands found in standard Unix shells, such as pipes, re-direction, and background execution. To exit Grunt you can type quit or enter Ctrl-D.

Besides entering Pig Latin interactively, Grunt’s other major use is to act as a shell for HDFS. In versions 0.5 and later of Pig, all hadoop fs shell commands are available. They are accessed using the keyword fs. The dash (-) used in the hadoop fs is also required

grunt>fs -ls

A number of the commands come directly from Unix shells and will operate in ways that are familiar: chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, and stat. A few of them either look like Unix commands you are used to but behave slightly differently or are unfamiliar, including

  • cat filename – Print the contents of a file to stdout. You can apply this command to a directory and it will apply itself in turn to each file in the directory.
  • copyFromLocal localfile hdfsfile – Copy a file from your local disk to HDFS. This is done serially, not in parallel.
  • copyToLocal hdfsfile localfile – Copy a file from HDFS to your local disk. This is done serially, not in parallel.
  • rmr filename – Remove files recursively. This is equivalent to rm -r in Unix. Use this with caution.

Pig Scripts

Use Pig scripts to place Pig Latin statements and Pig commands in a single file. While not required, it is good practice to identify the file using the *.pig extension. You can run Pig scripts from the command line and from the Grunt shell. Pig scripts allow you to pass values to parameters using parameter substitution.

Comments in Scripts – You can include comments in Pig scripts:

For multi-line comments use /* …. */

For single-line comments use —

/* myscript.pig

My script is simple.

It includes three Pig Latin statements.

*/

A = LOAD ‘student’ USING PigStorage() AS (name:chararray, age:int, gpa:float); — loading data

B = FOREACH A GENERATE name;  — transforming data

DUMP B;  — retrieving results

Scripts and Distributed File Systems – Pig supports running scripts (and Jar files) that are stored in HDFS, Amazon S3, and other distributed file systems. The script’s full location URI is required. For example, to run a Pig script on HDFS, do the following:

$ pig hdfs://nn.mydomain.com:9020/myscripts/script.pig

Apply for Big Data and Hadoop Developer Certification

https://www.vskills.in/certification/certified-big-data-and-apache-hadoop-developer

Back to Tutorials

Installation, Local and hadoop mode
Znodes

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?