Hadoop administration involves, managing the hadoop setup, using the built-in admin UI, taking system backups and node management
dfsadmin, fsck and balancer
dfsadmin
It runs a HDFS dfsadmin client. The hadoop dfsadmin command supports a few HDFS administration related operations. The bin/hadoop dfsadmin -help command lists all the commands currently supported. For e.g.:
- Rreport : reports basic statistics of HDFS. Some of this information is also available on the NameNode front page.
- Ssafemode : though usually not required, an administrator can manually enter or leave Safemode.
- FinalizeUpgrade : removes previous backup of the cluster made during last upgrade.
- FrefreshNodes : Updates the set of hosts allowed to connect to namenode. Re-reads the config file to update values defined by dfs.hosts and dfs.host.exclude and reads the entires (hostnames) in those files. Each entry not defined in dfs.hosts but in dfs.hosts.exclude is decommissioned. Each entry defined in dfs.hosts and also in dfs.host.exclude is stopped from decommissioning if it has already been marked for decommission. Entires not present in both the lists are decommissioned.
PrintTopology : Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode.
In Hadoop 1,
Usage: hadoop dfsadmin [GENERIC_OPTIONS] [-report] [-safemode enter | leave | get | wait] [-refreshNodes] [-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename] [-setQuota <quota> <dirname>…<dirname>] [-clrQuota <dirname>…<dirname>] [-help [cmd]]
| COMMAND_OPTION | Description | 
| -report | Reports basic filesystem information and statistics. | 
| -safemode enter | leave | get | wait | Safe mode maintenance command. Safe mode is a Namenode state in which it 1. does not accept changes to the name space (read-only) 2. does not replicate or delete blocks. Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be entered manually, but then it can only be turned off manually as well. | 
| -refreshNodes | Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned. | 
| -finalizeUpgrade | Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process. | 
| -upgradeProgress status | details | force | Request current distributed upgrade status, a detailed status or force the upgrade to proceed. | 
| -metasave filename | Save Namenode’s primary data structures to <filename> in the directory specified by hadoop.log.dir property. <filename> will contain one line for each of the following 1. Datanodes heart beating with Namenode 2. Blocks waiting to be replicated 3. Blocks currently being replicated 4. Blocks waiting to be deleted | 
| -setQuota <quota> <dirname>…<dirname> | Set the quota <quota> for each directory <dirname>. The directory quota is a long integer that puts a hard limit on the number of names in the directory tree. Best effort for the directory, with faults reported if 1. N is not a positive integer, or 2. user is not an administrator, or 3. the directory does not exist or is a file, or 4. the directory would immediately exceed the new quota. | 
| -clrQuota <dirname>…<dirname> | Clear the quota for each directory <dirname>. Best effort for the directory. with fault reported if 1. the directory does not exist or is a file, or 2. user is not an administrator. It does not fault if the directory has no quota. | 
| -help [cmd] | Displays help for the given command or all commands if none is specified. | 
Apply for Big Data and Hadoop Developer Certification
https://www.vskills.in/certification/certified-big-data-and-apache-hadoop-developer
