Hadoop & Mapreduce Tutorial | Job Formats

Job Formats

In Hadoop 1, JobClient is the primary interface by which user-job interacts with the JobTracker. JobClient provides facilities to submit jobs, track their progress, access component-tasks’ reports and logs, get the MapReduce cluster’s status information and so on. The job submission process involves

  • Checking the input and output specifications of the job.
  • Computing the InputSplit values for the job.
  • Setting up the requisite accounting information for the DistributedCache of the job, if necessary.
  • Copying the job’s jar and configuration to the MapReduce system directory on the FileSystem.
  • Submitting the job to the JobTracker and optionally monitoring it’s status.

Job history files are also logged to user specified directory hadoop.job.history.user.location which defaults to job output directory. The files are stored in “_logs/history/” in the specified directory. Hence, by default they will be in mapred.output.dir/_logs/history. User can stop logging by giving the value none for hadoop.job.history.user.location

User can view the history logs summary in specified directory using the following command

$ bin/hadoop job -history output-dir

This command will print job details, failed and killed tip details. More details about the job such as successful tasks and task attempts made for each task can be viewed using the following command

$ bin/hadoop job -history all output-dir

User can use OutputLogFilter to filter log files from the output directory listing. Normally the user creates the application, describes various facets of the job via JobConf, and then uses the JobClient to submit the job and monitor its progress.

Job Authorization – In Hadoop 1, job level authorization and queue level authorization are enabled on the cluster, if the configuration mapred.acls.enabled is set to true. When enabled, access control checks are done by (a) the JobTracker before allowing users to submit jobs to queues and administering these jobs and (b) by the JobTracker and the TaskTracker before allowing users to view job details or to modify a job using MapReduce APIs, CLI or web user interfaces.

A job submitter can specify access control lists for viewing or modifying a job via the configuration properties mapreduce.job.acl-view-job and mapreduce.job.acl-modify-job respectively. By default, nobody is given access in these properties.

However, irrespective of the job ACLs configured, a job’s owner, the superuser and cluster administrators (mapreduce.cluster.administrators) and queue administrators of the queue to which the job was submitted to (mapred.queue.queue-name.acl-administer-jobs) always have access to view and modify a job.

A job view ACL authorizes users against the configured mapreduce.job.acl-view-job before returning possibly sensitive information about a job, like:

  • job level counters
  • task level counters
  • tasks’s diagnostic information
  • task logs displayed on the TaskTracker web UI
  • xml showed by the JobTracker’s web UI

Other information about a job, like its status and its profile, is accessible to all users, without requiring authorization.

A job modification ACL authorizes users against the configured mapreduce.job.acl-modify-job before allowing modifications to jobs, like:

  • killing a job
  • killing/failing a task of a job
  • setting the priority of a job

These operations are also permitted by the queue level ACL, “mapred.queue.queue-name.acl-administer-jobs”, configured via mapred-queue-acls.xml. The caller will be able to do the operation if he/she is part of either queue admins ACL or job modification ACL.

Apply for Big Data and Hadoop Developer Certification

https://www.vskills.in/certification/certified-big-data-and-apache-hadoop-developer

Back to Tutorials

Share this post
[social_warfare]
JobTracker and TaskTracker classes
Hadoop & Mapreduce Tutorial | Debugging and Profiling

Get industry recognized certification – Contact us

keyboard_arrow_up