HBase provides mechanisms to secure various components and aspects of HBase and how it relates to the rest of the Hadoop infrastructure, as well as clients and resources outside Hadoop. HBase provides several strategies for securing your data:
- Role-based Access Control (RBAC) controls which users or groups can read and write to a given HBase resource or execute a coprocessor endpoint, using the familiar paradigm of roles.
- Visibility Labels which allow you to label cells and control access to labelled cells, to further restrict who can read or write to certain subsets of your data. Visibility labels are stored as tags.
- Transparent encryption of data at rest on the underlying filesystem, both in HFiles and in the WAL. This protects your data at rest from an attacker who has access to the underlying filesystem, without the need to change the implementation of the client. It can also protect against data leakage from improperly disposed disks, which can be important for legal and regulatory compliance.
Using HTTPS
A default HBase install uses insecure HTTP connections for Web UIs for the master and region servers. To enable secure HTTP (HTTPS) connections instead, set hbase.ssl.enabled to true in hbase-site.xml. This does not change the port used by the Web UI. To change the port for the web UI for a given HBase component, configure that port’s setting in hbase-site.xml. These settings are – hbase.master.info.port and hbase.regionserver.info.port
If you enable HTTPS, clients should avoid using the non-secure HTTP connection.
If you enable secure HTTP, clients should connect to HBase using the https:// URL. Clients using the http:// URL will receive an HTTP response of 200, but will not receive any data. The following exception is logged:
javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
This is because the same port is used for HTTP and HTTPS. HBase uses Jetty for the Web UI. Without modifying Jetty itself, it does not seem possible to configure Jetty to redirect one port to another on the same host.
Simple User Access
Simple user access is not a secure method of operating HBase. This method is used to prevent users from making mistakes. It can be used to mimic the Access Control using on a development system without having to set up Kerberos. This method is not used to prevent malicious or hacking attempts.
- Server-side Configuration – Add the following to the hbase-site.xml file on every server machine in the cluster:
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.regionserver.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
For 0.94, add the following to the hbase-site.xml file on every server machine in the cluster:
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
A full shutdown and restart of HBase service is required when deploying these configuration changes.
- Client-side Configuration – Add the following to the hbase-site.xml file on every client:
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
For 0.94, add the following to the hbase-site.xml file on every server machine in the cluster:
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
Be advised that if the hbase.security.authentication in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.
- Client-side Configuration for – REST Gateway – The REST gateway will authenticate with HBase using the supplied credential. No authentication will be performed by the REST gateway itself. All client access via the REST gateway will use the REST gateway’s credential and have its privilege.
The REST gateway user will need access. For example, to give the REST API user, rest_server, administrative access, a command such as this one will suffice:
grant ‘rest_server’, ‘RWCA’
It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPNEGO HTTP authentication. This is future work.
Secure Client Access to Apache HBase
Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients.
Prerequisites
- Hadoop Authentication Configuration – To run HBase RPC with strong authentication, you must set hbase.security.authentication to kerberos. In this case, you must also set hadoop.security.authentication to kerberos in core-site.xml. Otherwise, you would be using strong authentication for HBase but not for the underlying HDFS, which would cancel out any benefit.
- Kerberos KDC – You need to have a working Kerberos KDC.
- Server-side Configuration – First, refer to security.prerequisites and ensure that your underlying HDFS configuration is secure. Add the following to the hbase-site.xml file on every server machine in the cluster:
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.token.TokenProvider</value>
</property>
A full shutdown and restart of HBase service is required when deploying these configuration changes.
- Client-side Configuration – First, refer to Prerequisites and ensure that your underlying HDFS configuration is secure. Add the following to the hbase-site.xml file on every client:
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
The client environment must be logged in to Kerberos from KDC or keytab via the kinit command before communication with the HBase cluster will be possible.
Be advised that if the hbase.security.authentication in the client- and server-side site files do not match, the client will not be able to communicate with the cluster.
Once HBase is configured for secure RPC it is possible to optionally configure encrypted communication. To do so, add the following to the hbase-site.xml file on every client:
<property>
<name>hbase.rpc.protection</name>
<value>privacy</value>
</property>
This configuration property can also be set on a per-connection basis. Set it in the Configuration supplied to Table:
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
conf.set(“hbase.rpc.protection”, “privacy”);
try (Connection connection = ConnectionFactory.createConnection(conf)) {
try (Table table = connection.getTable(TableName.valueOf(tablename)) {
…. do your stuff
}
}
Expect a ~10% performance penalty for encrypted communication.
- Client-side Configuration – REST Gateway – Add the following to the hbase-site.xml file for every REST gateway:
<property>
<name>hbase.rest.keytab.file</name>
<value>$KEYTAB</value>
</property>
<property>
<name>hbase.rest.kerberos.principal</name>
<value>$USER/[email protected]</value>
</property>
Substitute the appropriate credential and keytab for $USER and $KEYTAB respectively. The REST gateway will authenticate with HBase using the supplied credential. In order to use the REST API principal to interact with HBase, it is also necessary to add the hbase.rest.kerberos.principal to the acl table. For example, to give the REST API principal, rest_server, administrative access, a command such as this one will suffice:
grant ‘rest_server’, ‘RWCA’
HBase REST gateway supports SPNEGO HTTP authentication for client access to the gateway. To enable REST gateway Kerberos authentication for client access, add the following to the hbase-site.xml file for every REST gateway.
<property>
<name>hbase.rest.support.proxyuser</name>
<value>true</value>
</property>
<property>
<name>hbase.rest.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.rest.authentication.kerberos.principal</name>
<value>HTTP/[email protected]</value>
</property>
<property>
<name>hbase.rest.authentication.kerberos.keytab</name>
<value>$KEYTAB</value>
</property>
<!– Add these if you need to configure a different DNS interface from the default –>
<property>
<name>hbase.rest.dns.interface</name>
<value>default</value>
</property>
<property>
<name>hbase.rest.dns.nameserver</name>
<value>default</value>
</property>
Substitute the keytab for HTTP for $KEYTAB.
HBase REST gateway supports different ‘hbase.rest.authentication.type’: simple, kerberos. You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as ‘hbase.rest.authentication.type’ value.
Securing Access to HDFS and ZooKeeper
Secure HBase requires secure ZooKeeper and HDFS so that users cannot access and/or modify the metadata and data from under HBase. HBase uses HDFS (or configured file system) to keep its data files as well as write ahead logs (WALs) and other data. HBase uses ZooKeeper to store some metadata for operations (master address, table locks, recovery state, etc).
Securing ZooKeeper Data – ZooKeeper has a pluggable authentication mechanism to enable access from clients using different methods. ZooKeeper even allows authenticated and un-authenticated clients at the same time. The access to znodes can be restricted by providing Access Control Lists (ACLs) per znode. An ACL contains two components, the authentication method and the principal. ACLs are NOT enforced hierarchically.
HBase daemons authenticate to ZooKeeper via SASL and kerberos. HBase sets up the znode ACLs so that only the HBase user and the configured hbase superuser (hbase.superuser) can access and modify the data. In cases where ZooKeeper is used for service discovery or sharing state with the client, the znodes created by HBase will also allow anyone (regardless of authentication) to read these znodes (clusterId, master address, meta location, etc), but only the HBase user can modify them.
Securing HDFS Data – All of the data under management is kept under the root directory in the file system (hbase.rootdir). Access to the data and WAL files in the filesystem should be restricted so that users cannot bypass the HBase layer, and peek at the underlying data files from the file system. HBase assumes the filesystem used (HDFS or other) enforces permissions hierarchically. If sufficient protection from the file system (both authorization and authentication) is not provided, HBase level authorization control (ACLs, visibility labels, etc) is meaningless since the user can always access the data from the file system.
HBase enforces the posix-like permissions 700 (rwx——) to its root directory. It means that only the HBase user can read or write the files in FS. The default setting can be changed by configuring hbase.rootdir.perms in hbase-site.xml. A restart of the active master is needed so that it changes the used permissions. For versions before 1.2.0, you can check whether HBASE-13780 is committed, and if not, you can manually set the permissions for the root directory if needed. Using HDFS, the command would be:
sudo -u hdfs hadoop fs -chmod 700 /hbase
You should change /hbase if you are using a different hbase.rootdir.
In secure mode, SecureBulkLoadEndpoint should be configured and used for properly handing of users files created from MR jobs to the HBase daemons and HBase user. The staging directory in the distributed file system used for bulk load (hbase.bulkload.staging.dir, defaults to /tmp/hbase-staging) should have (mode 711, or rwx—x—x) so that users can access the staging directory created under that parent directory, but cannot do any other operation.