Configuration

Permissions for both HDFS and local fileSystem paths

The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:

Filesystem	Path	User:Group	Permissions
local	dfs.namenode.name.dir	hdfs:hadoop	drwx——
local	dfs.datanode.data.dir	hdfs:hadoop	drwx——
local	$HADOOP_LOG_DIR	hdfs:hadoop	drwxrwxr-x
local	$YARN_LOG_DIR	yarn:hadoop	drwxrwxr-x
local	yarn.nodemanager.local-dirs	yarn:hadoop	drwxr-xr-x
local	yarn.nodemanager.log-dirs	yarn:hadoop	drwxr-xr-x
local	container-executor	root:hadoop	–Sr-s–*
local	conf/container-executor.cfg	root:hadoop	r——-*
hdfs	/	hdfs:hadoop	drwxr-xr-x
hdfs	/tmp	hdfs:hadoop	drwxrwxrwxt
hdfs	/user	hdfs:hadoop	drwxr-xr-x
hdfs	yarn.nodemanager.remote-app-log-dir	yarn:hadoop	drwxrwxrwxt
hdfs	mapreduce.jobhistory.intermediate-done-dir	mapred:hadoop	drwxrwxrwxt
hdfs	mapreduce.jobhistory.done-dir	mapred:hadoop	drwxr-x—

Common Configurations

In order to turn on RPC authentication in hadoop, set the value of hadoop.security.authentication property to “kerberos”, and set security related settings listed below appropriately. The following properties should be in the core-site.xml of all the nodes in the cluster.

Parameter	Value	Notes
hadoop.security.authentication	kerberos	simple : No authentication. (default) kerberos : Enable authentication by Kerberos.
hadoop.security.authorization	true	Enable RPC service-level authorization.
hadoop.rpc.protection	authentication	authentication : authentication only (default) integrity : integrity check in addition to authentication privacy : data encryption in addition to integrity
hadoop.security.auth_to_local	RULE:exp1 RULE:exp2 … DEFAULT	The value is string containing new line characters.
hadoop.proxyuser.superuser.hosts		comma separated hosts from which superuser access are allowed to impersonation. * means wildcard.
hadoop.proxyuser.superuser.groups		comma separated groups to which users impersonated by superuser belongs. * means wildcard.

NameNode

Parameter	Value	Notes
dfs.block.access.token.enable	true	Enable HDFS block access tokens for secure operations.
dfs.https.enable	true	This value is deprecated. Use dfs.http.policy
dfs.http.policy	HTTP_ONLY or HTTPS_ONLY or HTTP_AND_HTTPS	HTTPS_ONLY turns off http access. This option takes precedence over the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. If using SASL to authenticate data transfer protocol instead of running DataNode as root and using privileged ports, then this property must be set to HTTPS_ONLY to guarantee authentication of HTTP servers.
dfs.namenode.keytab.file	/etc/security/keytab/nn.service.keytab	Kerberos keytab file for the NameNode.
dfs.namenode.kerberos.principal	nn/_HOST@REALM.TLD	Kerberos principal name for the NameNode.
dfs.namenode.kerberos.internal.spnego.principal	HTTP/_HOST@REALM.TLD	HTTP Kerberos principal name for the NameNode.

Secondary NameNode

Parameter	Value	Notes
dfs.namenode.secondary.http-address	c_nn_host_fqdn:50090
dfs.namenode.secondary.https-port	50470
dfs.secondary.namenode.keytab.file	/etc/security/keytab/sn.service.keytab	Kerberos keytab file for the Secondary NameNode.
dfs.secondary.namenode.kerberos.principal	sn/_HOST@REALM.TLD	Kerberos principal name for the Secondary NameNode.
dfs.secondary.namenode.kerberos.internal.spnego.principal	HTTP/_HOST@REALM.TLD	HTTP Kerberos principal name for the Secondary NameNode.

DataNode

Parameter	Value	Notes
dfs.datanode.data.dir.perm	700
dfs.datanode.address	0.0.0.0:1004	Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. Alternatively, this must be set to a non-privileged port if using SASL to authenticate data transfer protocol.
dfs.datanode.http.address	0.0.0.0:1006	Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc.
dfs.datanode.https.address	0.0.0.0:50470
dfs.datanode.keytab.file	/etc/security/keytab/dn.service.keytab	Kerberos keytab file for the DataNode.
dfs.datanode.kerberos.principal	dn/_HOST@REALM.TLD	Kerberos principal name for the DataNode.
dfs.encrypt.data.transfer	false	set to true when using data encryption
dfs.encrypt.data.transfer.algorithm		optionally set to 3des or rc4 when using data encryption to control encryption algorithm
dfs.encrypt.data.transfer.cipher.suites		optionally set to AES/CTR/NoPadding to activate AES encryption when using data encryption
dfs.encrypt.data.transfer.cipher.key.bitlength		optionally set to 128, 192 or 256 to control key bit length when using AES with data encryption
dfs.data.transfer.protection		authentication : authentication only integrity : integrity check in addition to authentication privacy : data encryption in addition to integrity This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HADOOP_SECURE_DN_USER environment variable must be undefined when starting the DataNode process.

WebHDFS

Parameter	Value	Notes
dfs.web.authentication.kerberos.principal	http/_HOST@REALM.TLD	Kerberos principal name for the WebHDFS.
dfs.web.authentication.kerberos.keytab	/etc/security/keytab/http.service.keytab	Kerberos keytab file for WebHDFS.

ResourceManager

Parameter	Value	Notes
yarn.resourcemanager.keytab	/etc/security/keytab/rm.service.keytab	Kerberos keytab file for the ResourceManager.
yarn.resourcemanager.principal	rm/_HOST@REALM.TLD	Kerberos principal name for the ResourceManager.

NodeManager

Parameter	Value	Notes
yarn.nodemanager.keytab	/etc/security/keytab/nm.service.keytab	Kerberos keytab file for the NodeManager.
yarn.nodemanager.principal	nm/_HOST@REALM.TLD	Kerberos principal name for the NodeManager.
yarn.nodemanager.container-executor.class	org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor	Use LinuxContainerExecutor.
yarn.nodemanager.linux-container-executor.group	hadoop	Unix group of the NodeManager.
yarn.nodemanager.linux-container-executor.path	/path/to/bin/container-executor	The path to the executable of Linux container executor.

Configuration for WebAppProxy

The WebAppProxy provides a proxy between the web applications exported by an application and an end user. If security is enabled it will warn users before accessing a potentially unsafe web application. Authentication and authorization using the proxy is handled just like any other privileged web application.

Parameter	Value	Notes
yarn.web-proxy.address	WebAppProxy host:port for proxy to AM web apps.	host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then the ResourceManager will run the proxy otherwise a standalone proxy server will need to be launched.
yarn.web-proxy.keytab	/etc/security/keytab/web-app.service.keytab	Kerberos keytab file for the WebAppProxy.
yarn.web-proxy.principal	wap/_HOST@REALM.TLD	Kerberos principal name for the WebAppProxy.

LinuxContainerExecutor

A ContainerExecutor used by YARN framework which define how any container launched and controlled.

The following are the available in Hadoop YARN:

ContainerExecutor	Description
DefaultContainerExecutor	The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager.
LinuxContainerExecutor	Supported only on GNU/Linux, this executor runs the containers as either the YARN user who submitted the application (when full security is enabled) or as a dedicated user (defaults to nobody) when full security is not enabled. When full security is enabled, this executor requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.

To build the LinuxContainerExecutor executable run:

$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/

The path passed in -Dcontainer-executor.conf.dir should be the path on the cluster nodes where a configuration file for the setuid executable should be located. The executable should be installed in $HADOOP_YARN_HOME/bin.

The executable must have specific permissions: 6050 or –Sr-s— permissions user-owned by root (super-user) and group-owned by a special group (e.g. hadoop) of which the NodeManager Unix user is the group member and no ordinary application user is. If any application user belongs to this special group, security will be compromised. This special group name should be specified for the configuration property yarn.nodemanager.linux-container-executor.group in both conf/yarn-site.xml and conf/container-executor.cfg.

For example, let’s say that the NodeManager is run as user yarn who is part of the groups users and hadoop, any of them being the primary group. Let also be that users has both yarn and another user (application submitter) alice as its members, and alice does not belong to hadoop. Going by the above description, the setuid/setgid executable should be set 6050 or –Sr-s— with user-owner as yarn and group-owner as hadoop which has yarn as its member (and not users which has alice also as its member besides yarn).

The LinuxTaskController requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to be set 755 permissions as described above in the table on permissions on directories.

conf/container-executor.cfg

The executable requires a configuration file called container-executor.cfg to be present in the configuration directory passed to the mvn target mentioned above. The configuration file must be owned by the user running NodeManager (user yarn in the above example), group-owned by anyone and should have the permissions 0400 or r——– .

The executable requires following configuration items to be present in the conf/container-executor.cfg file. The items should be mentioned as simple key=value pairs, one per-line:

Parameter	Value	Notes
yarn.nodemanager.linux-container-executor.group	hadoop	Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary.
banned.users	hdfs,yarn,mapred,bin	Banned users.
allowed.system.users	foo,bar	Allowed system users.
min.user.id	1000	Prevent other super-users.

To re-cap, here are the local file system permissions required for the various paths related to the LinuxContainerExecutor

Filesystem	Path	User:Group	Permissions
local	container-executor	root:hadoop	–Sr-s–*
local	conf/container-executor.cfg	root:hadoop	r——-*
local	yarn.nodemanager.local-dirs	yarn:hadoop	drwxr-xr-x
local	yarn.nodemanager.log-dirs	yarn:hadoop	drwxr-xr-x

MapReduce JobHistory Server

Parameter	Value	Notes
mapreduce.jobhistory.address	MapReduce JobHistory Server host:port	Default port is 10020.
mapreduce.jobhistory.keytab	/etc/security/keytab/jhs.service.keytab	Kerberos keytab file for the MapReduce JobHistory Server.
mapreduce.jobhistory.principal	jhs/_HOST@REALM.TLD	Kerberos principal name for the MapReduce JobHistory Server.

Apply for Big Data and Hadoop Developer Certification

https://www.vskills.in/certification/certified-big-data-and-apache-hadoop-developer

Back to Tutorials

Team Vskills

Data Confidentiality

HDFS HA

Configuration

Apply for Big Data and Hadoop Developer Certification

Back to Tutorials

Get Govt. Certified Secure Assured Job Interview

Upgrade Your Job Skills Now!

Get industry recognized certification – Contact us

Get Govt. Certified
Secure Assured Job Interview