Connection handling

HTablePool is Deprecated. Previous versions of this guide discussed HTablePool, which was deprecated in HBase 0.94, 0.95, and 0.96, and removed in 0.98.1, by HBASE-6580, or HConnection, which is deprecated in HBase 1.0 by Connection.

HBase Connection

Connection is a dictionary word used to represent a relationship between two different objects. Connection establishment is done to ensure transfer of entities between the objects. In our daily lives, we use different connections like electricity connection, water connection, gas connection, broadband connection etc. Different connections bind different objects and help transfer of different entities between the connection holders. In computer science world, we use this term heavily in different contexts. For example – HTTP connections, web socket connections are too mainstream in web development. In this article, we are going to explain what is hbase connection, why we should share connections and how to implement a simple connection pool.

HBase Connection is little different than usual connections in terms of the number of objects which are bound to hbase client (connection initiator). It is a cluster connection and encapsulates lower level individual connections to all region servers and a connection to zookeeper.

The connection object contains logic to find the master, locate regions out on the cluster, keeps a cache of locations and then knows how to re-calibrate after they move. The individual connections to servers, meta cache, zookeeper connection, etc are all shared by the Table and Admin instances obtained from this connection.

Reusing connection

Connection is a heavy weight object and encapsulates a lot of information and lower level connections. This itself gives an intuitive idea that connection creation is a heavy weight operation. Time of connection creation depends on the number of region servers in the cluster. Connection implementations are thread-safe, so that a client can create a connection once, and share it with different threads. Table and Admin instances, on the other hand, are light-weight and are not thread-safe. Typically, a single connection per client application is instantiated and every thread will obtain its own Table instance. Since Table and Admin instances are light weight operations, caching or pooling of Table and Admin is not recommended.

If you have an application with heavy read or writes, it is strongly advised to reuse connection to achieve gain in your performance numbers.

Connection Pool – Simple Implementation

We can simply pre-create a pool of connections to serve incoming GET/PUT requests. When a request is received we randomly pick a connections from the pool and use it to perform the corresponding action.

object HBaseConfigurationUtil {

lazy val logger = LoggerFactory.getLogger(“HBaseConfigurationUtil”)

def configurationForPointLookup(tableName: String, hbaseQuorum: String, zNodeParentOpt: String) = {

val conf = HBaseConfiguration.create()

conf.set(“hbase.zookeeper.quorum”, hbaseQuorum)

conf.set(“zookeeper.znode.parent”, zNodeParentOpt)

conf

}

}

// Creating connection pool

private val connectionPoolSize = scala.util.Properties.envOrElse(“hbaseConnectionPoolSize”, “10”).toInt

private val connectionPool = for (connectionIndex <- 0 until connectionPoolSize)

yield ConnectionFactory.createConnection(HBaseConfigurationUtil.configurationForPointLookup(hBaseQuorum, zNodeParentOpt))

// Selecting randnom connection from the Pool

val randomGenerator = scala.util.Random

val connectionFromPool = connectionPool(randomGenerator.nextInt(connectionPoolSize))

Now you can use this connectionFromPool object to process your incoming GET/PUT request to hbase.

Apply for HBase Certification

https://www.vskills.in/certification/certified-hbase-professional

Back to Tutorials

Share this post
[social_warfare]
Counter and coprocessor class implementation
Administration

Get industry recognized certification – Contact us

keyboard_arrow_up