Requirements and installing on linux

Apply for HBase Certification Now!!

Apache HBase can be installed in three modes, which are

  • Standalone mode
  • Pseudo-Distributed mode
  • Fully Distributed mode

The prerequisite for HBase installation are Java and Hadoop installed on your Linux machine.

Java

The following table summarizes the recommendation of the HBase community wrt deploying on various Java versions.

HBase Version JDK 7 JDK 8 JDK 9 (Non-LTS) JDK 10 (Non-LTS) JDK 11
2.0+  No Yes HBASE-20264* HBASE-20264* HBASE-21110*
1.2+ Yes Yes HBASE-20264* HBASE-20264* HBASE-21110*

A yes, is meant to indicate a base level of testing and willingness to help diagnose and address issues you might run into. Similarly, an entry of no or ‘*’ generally means that should you run into an issue the community is likely to ask you to change the Java environment before proceeding to help. In some cases, specific guidance on limitations (e.g. whether compiling / unit tests work, specific operational issues, etc) will also be noted.

Hadoop

The following table summarizes the versions of Hadoop supported with each version of HBase. Older versions not appearing in this table are considered unsupported and likely missing necessary features, while newer versions are untested but may be suitable.

Hadoop 2.x is recommended. Hadoop 2.x is faster and includes features, such as short-circuit reads (see Leveraging local data), which will help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase does not support running with earlier versions of Hadoop. See the table below for requirements specific to different HBase versions.

Based on the version of HBase, you should select the most appropriate version of Hadoop. You can use Apache Hadoop, or a vendor’s distribution of Hadoop.

Hadoop version support matrix

Y = Tested to be fully-functional

N = Known to not be fully-functional

M = Not tested, may/may-not function

HBase-1.2.x,

HBase-1.3.x

HBase-1.4.x HBase-2.0.x HBase-2.1.x
Hadoop-2.4.x Y N N N
Hadoop-2.5.x Y N N N
Hadoop-2.6.0 N N N N
Hadoop-2.6.1+ Y N Y N
Hadoop-2.7.0 N N N N
Hadoop-2.7.1+ Y Y Y Y
Hadoop-2.8.[0-1] M N N N
Hadoop-2.8.2 M M M M
Hadoop-2.8.3+ M M Y Y
Hadoop-2.9.0 N N N N
Hadoop-2.9.1+ M M M M
Hadoop-3.0.[0-2] N N N N
Hadoop-3.0.3+ N N Y Y
Hadoop-3.1.0 N N N N
Hadoop-3.1.1+ N N Y Y

ZooKeeper

ZooKeeper 3.4.x is required.

Operating System Utilities

ssh – HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running ssh so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password.

DNS – HBase uses the local hostname to self-report its IP address.

NTP – The clocks on cluster nodes should be synchronized. A small amount of variation is acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time synchronization is one of the first things to check if you see unexplained problems in your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or another time-synchronization mechanism on your cluster and that all nodes look to the same service for time synchronization.

Installation

Hbase – Standalone mode installation:

Installation is performed on Ubuntu with Hadoop already installed.

Step 1) Place hbase-1.1.2-bin.tar.gz in /home/hduser

Step 2) Unzip it by executing command $tar -xvf hbase-1.1.2-bin.tar.gz. It will unzip the contents, and it will create hbase-1.1.2 in the location /home/hduser

Step 3) Open hbase-env.sh as below and mention JAVA_HOME path in the location.

Step 4) Open ~/.bashrc file and mention HBASE_HOME path as shown in below

export HBASE_HOME=/home/hduser/hbase-1.1.1 export PATH= $PATH:$HBASE_HOME/bin

Step 5) Open hbase-site.xml and place the following properties inside the file

hduser@ubuntu$ gedit hbase-site.xml(code as below)

<property>

<name>hbase.rootdir</name>

<value>file:///home/hduser/HBASE/hbase</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/HBASE/zookeeper</value>

</property>

Here we are placing two properties

  • One for HBase root directory and
  • Second one for data directory correspond to ZooKeeper.

All HMaster and ZooKeeper activities point out to this hbase-site.xml.

Step 6) Open hosts file present in /etc. location and mention the IPs as shown in below.

Step 7) Now Run Start-hbase.sh in hbase-1.1.1/bin location as shown below.

And we can check by jps command to see HMaster is running or not.

Step8) HBase shell can start by using “hbase shell” and it will enter into interactive shell mode as shown in below screenshot. Once it enters into shell mode, we can perform all type of commands.

The standalone mode does not require Hadoop daemons to start. HBase can run independently.

Hbase – Pseudo Distributed mode of installation:

This is another method for Apache Hbase Installation, known as Pseudo Distributed mode of Installation. Below are the steps to install HBase through this method.

Step 1) Place hbase-1.1.2-bin.tar.gz in /home/hduser

Step 2) Unzip it by executing command$tar -xvf hbase-1.1.2-bin.tar.gz. It will unzip the contents, and it will create hbase-1.1.2 in the location /home/hduser

Step 3) Open hbase-env.sh as following below and mention JAVA_HOME path and Region servers’ path in the location and export the command

Step 4) In this step, we are going to open ~/.bashrc file and mention the HBASE_HOME path.

Step 5) Open HBase-site.xml and mention the below properties in the file.(Code as below)

<property>

<name>hbase.rootdir</name>

<value>hdfs://localhost:9000/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>localhost</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2181</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/hbase/zookeeper</value>

</property>

The above does

  • Setting up Hbase root directory in this property
  • For distributed set up we have to set this property
  • ZooKeeper quorum property should be set up here
  • Replication set up done in this property. By default we are placing replication as 1.
  • In the fully distributed mode, multiple data nodes present so we can increase replication by placing more than 1 value in the dfs.replication property
  • Client port should be mentioned in this property
  • ZooKeeper data directory can be mentioned in this property

Step 6) Start Hadoop daemons first and after that start HBase daemons as shown below

Here first you have to start Hadoop daemons by using”./start-all.sh” command as shown in below.

After starting Hbase daemons by hbase-start.sh

Now check jps

Hbase – Fully Distributed mode installation:-

This set up will work in Hadoop cluster mode where multiple nodes spawn across the cluster and running. The installation is same as pseudo distributed mode; the only difference is that it will spawn across multiple nodes.

The configurations files mentioned in HBase-site.xml and hbase-env.sh is same as mentioned in pseudo mode.

Confirming Your Installation

Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the MapReduce or YARN daemons. These do not need to be started.

If you are managing your own ZooKeeper, start it and confirm it’s running, else HBase will start up ZooKeeper for you as part of its start process.

Start HBase with the following command:

bin/start-hbase.sh

Run the above from the HBASE_HOME directory.

You should now have a running HBase instance. HBase logs can be found in the logs subdirectory. Check them out especially if HBase had trouble starting.

HBase also puts up a UI listing vital attributes. By default it’s deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named master.example.org on the default port, point your browser at http://master.example.org:16010 to see the web interface.

Once HBase has started, see the shell exercises section for how to create tables, add data, scan your insertions, and finally disable and drop your tables.

To stop HBase after exiting the HBase shell enter

$ ./bin/stop-hbase.sh

stopping hbase……………

Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.

http://www.vskills.in/certification/Certified-HBase-Professional

Share this post
[social_warfare]
Installation
Standalone and distributed modes

Get industry recognized certification – Contact us

keyboard_arrow_up