Configuration

Permissions for both HDFS and local fileSystem paths

The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:

FilesystemPathUser:GroupPermissions
localdfs.namenode.name.dirhdfs:hadoopdrwx——
localdfs.datanode.data.dirhdfs:hadoopdrwx——
local$HADOOP_LOG_DIRhdfs:hadoopdrwxrwxr-x
local$YARN_LOG_DIRyarn:hadoopdrwxrwxr-x
localyarn.nodemanager.local-dirsyarn:hadoopdrwxr-xr-x
localyarn.nodemanager.log-dirsyarn:hadoopdrwxr-xr-x
localcontainer-executorroot:hadoop–Sr-s–*
localconf/container-executor.cfgroot:hadoopr——-*
hdfs/hdfs:hadoopdrwxr-xr-x
hdfs/tmphdfs:hadoopdrwxrwxrwxt
hdfs/userhdfs:hadoopdrwxr-xr-x
hdfsyarn.nodemanager.remote-app-log-diryarn:hadoopdrwxrwxrwxt
hdfsmapreduce.jobhistory.intermediate-done-dirmapred:hadoopdrwxrwxrwxt
hdfsmapreduce.jobhistory.done-dirmapred:hadoopdrwxr-x—

Common Configurations

In order to turn on RPC authentication in hadoop, set the value of hadoop.security.authentication property to “kerberos”, and set security related settings listed below appropriately. The following properties should be in the core-site.xml of all the nodes in the cluster.

ParameterValueNotes
hadoop.security.authenticationkerberossimple : No authentication. (default)  kerberos : Enable authentication by Kerberos.
hadoop.security.authorizationtrueEnable RPC service-level authorization.
hadoop.rpc.protectionauthenticationauthentication : authentication only (default)  integrity : integrity check in addition to authentication  privacy : data encryption in addition to integrity
hadoop.security.auth_to_localRULE:exp1 RULE:exp2 … DEFAULTThe value is string containing new line characters.
hadoop.proxyuser.superuser.hosts comma separated hosts from which superuser access are allowed to impersonation. * means wildcard.
hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belongs. * means wildcard.

NameNode

ParameterValueNotes
dfs.block.access.token.enabletrueEnable HDFS block access tokens for secure operations.
dfs.https.enabletrueThis value is deprecated. Use dfs.http.policy
dfs.http.policyHTTP_ONLY or HTTPS_ONLY or HTTP_AND_HTTPSHTTPS_ONLY turns off http access. This option takes precedence over the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. If using SASL to authenticate data transfer protocol instead of running DataNode as root and using privileged ports, then this property must be set to HTTPS_ONLY to guarantee authentication of HTTP servers.
dfs.namenode.keytab.file/etc/security/keytab/nn.service.keytabKerberos keytab file for the NameNode.
dfs.namenode.kerberos.principalnn/[email protected]Kerberos principal name for the NameNode.
dfs.namenode.kerberos.internal.spnego.principalHTTP/[email protected]HTTP Kerberos principal name for the NameNode.

Secondary NameNode

ParameterValueNotes
dfs.namenode.secondary.http-addressc_nn_host_fqdn:50090 
dfs.namenode.secondary.https-port50470 
dfs.secondary.namenode.keytab.file/etc/security/keytab/sn.service.keytabKerberos keytab file for the Secondary NameNode.
dfs.secondary.namenode.kerberos.principalsn/[email protected]Kerberos principal name for the Secondary NameNode.
dfs.secondary.namenode.kerberos.internal.spnego.principalHTTP/[email protected]HTTP Kerberos principal name for the Secondary NameNode.

DataNode

ParameterValueNotes
dfs.datanode.data.dir.perm700 
dfs.datanode.address0.0.0.0:1004Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. Alternatively, this must be set to a non-privileged port if using SASL to authenticate data transfer protocol.
dfs.datanode.http.address0.0.0.0:1006Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc.
dfs.datanode.https.address0.0.0.0:50470 
dfs.datanode.keytab.file/etc/security/keytab/dn.service.keytabKerberos keytab file for the DataNode.
dfs.datanode.kerberos.principaldn/[email protected]Kerberos principal name for the DataNode.
dfs.encrypt.data.transferfalseset to true when using data encryption
dfs.encrypt.data.transfer.algorithm optionally set to 3des or rc4 when using data encryption to control encryption algorithm
dfs.encrypt.data.transfer.cipher.suites optionally set to AES/CTR/NoPadding to activate AES encryption when using data encryption
dfs.encrypt.data.transfer.cipher.key.bitlength optionally set to 128, 192 or 256 to control key bit length when using AES with data encryption
dfs.data.transfer.protection authentication : authentication only  integrity : integrity check in addition to authentication  privacy : data encryption in addition to integrity This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HADOOP_SECURE_DN_USER environment variable must be undefined when starting the DataNode process.

WebHDFS

ParameterValueNotes
dfs.web.authentication.kerberos.principalhttp/[email protected]Kerberos principal name for the WebHDFS.
dfs.web.authentication.kerberos.keytab/etc/security/keytab/http.service.keytabKerberos keytab file for WebHDFS.

ResourceManager

ParameterValueNotes
yarn.resourcemanager.keytab/etc/security/keytab/rm.service.keytabKerberos keytab file for the ResourceManager.
yarn.resourcemanager.principalrm/[email protected]Kerberos principal name for the ResourceManager.

NodeManager

ParameterValueNotes
yarn.nodemanager.keytab/etc/security/keytab/nm.service.keytabKerberos keytab file for the NodeManager.
yarn.nodemanager.principalnm/[email protected]Kerberos principal name for the NodeManager.
yarn.nodemanager.container-executor.classorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutorUse LinuxContainerExecutor.
yarn.nodemanager.linux-container-executor.grouphadoopUnix group of the NodeManager.
yarn.nodemanager.linux-container-executor.path/path/to/bin/container-executorThe path to the executable of Linux container executor.

Configuration for WebAppProxy

The WebAppProxy provides a proxy between the web applications exported by an application and an end user. If security is enabled it will warn users before accessing a potentially unsafe web application. Authentication and authorization using the proxy is handled just like any other privileged web application.

ParameterValueNotes
yarn.web-proxy.addressWebAppProxy host:port for proxy to AM web apps.host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then the ResourceManager will run the proxy otherwise a standalone proxy server will need to be launched.
yarn.web-proxy.keytab/etc/security/keytab/web-app.service.keytabKerberos keytab file for the WebAppProxy.
yarn.web-proxy.principalwap/[email protected]Kerberos principal name for the WebAppProxy.

LinuxContainerExecutor

A ContainerExecutor used by YARN framework which define how any container launched and controlled.

The following are the available in Hadoop YARN:

ContainerExecutorDescription
DefaultContainerExecutorThe default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager.
LinuxContainerExecutorSupported only on GNU/Linux, this executor runs the containers as either the YARN user who submitted the application (when full security is enabled) or as a dedicated user (defaults to nobody) when full security is not enabled. When full security is enabled, this executor requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.

To build the LinuxContainerExecutor executable run:

$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/

The path passed in -Dcontainer-executor.conf.dir should be the path on the cluster nodes where a configuration file for the setuid executable should be located. The executable should be installed in $HADOOP_YARN_HOME/bin.

The executable must have specific permissions: 6050 or –Sr-s— permissions user-owned by root (super-user) and group-owned by a special group (e.g. hadoop) of which the NodeManager Unix user is the group member and no ordinary application user is. If any application user belongs to this special group, security will be compromised. This special group name should be specified for the configuration property yarn.nodemanager.linux-container-executor.group in both conf/yarn-site.xml and conf/container-executor.cfg.

For example, let’s say that the NodeManager is run as user yarn who is part of the groups users and hadoop, any of them being the primary group. Let also be that users has both yarn and another user (application submitter) alice as its members, and alice does not belong to hadoop. Going by the above description, the setuid/setgid executable should be set 6050 or –Sr-s— with user-owner as yarn and group-owner as hadoop which has yarn as its member (and not users which has alice also as its member besides yarn).

The LinuxTaskController requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to be set 755 permissions as described above in the table on permissions on directories.

conf/container-executor.cfg

The executable requires a configuration file called container-executor.cfg to be present in the configuration directory passed to the mvn target mentioned above. The configuration file must be owned by the user running NodeManager (user yarn in the above example), group-owned by anyone and should have the permissions 0400 or r——– .

The executable requires following configuration items to be present in the conf/container-executor.cfg file. The items should be mentioned as simple key=value pairs, one per-line:

ParameterValueNotes
yarn.nodemanager.linux-container-executor.grouphadoopUnix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary.
banned.usershdfs,yarn,mapred,binBanned users.
allowed.system.usersfoo,barAllowed system users.
min.user.id1000Prevent other super-users.

To re-cap, here are the local file system permissions required for the various paths related to the LinuxContainerExecutor

FilesystemPathUser:GroupPermissions
localcontainer-executorroot:hadoop–Sr-s–*
localconf/container-executor.cfgroot:hadoopr——-*
localyarn.nodemanager.local-dirsyarn:hadoopdrwxr-xr-x
localyarn.nodemanager.log-dirsyarn:hadoopdrwxr-xr-x

MapReduce JobHistory Server

ParameterValueNotes
mapreduce.jobhistory.addressMapReduce JobHistory Server host:portDefault port is 10020.
mapreduce.jobhistory.keytab/etc/security/keytab/jhs.service.keytabKerberos keytab file for the MapReduce JobHistory Server.
mapreduce.jobhistory.principaljhs/[email protected]Kerberos principal name for the MapReduce JobHistory Server.

Apply for Big Data and Hadoop Developer Certification

https://www.vskills.in/certification/certified-big-data-and-apache-hadoop-developer

Back to Tutorials

Data Confidentiality
HDFS HA

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?