Load testing, fault tolerance and load balancing

There are multiple methodologies that you can use to load test

HBase PerformanceEvaluation(PE)

The best thing about this tool is that it is an in-built class available with the stock apache HBase distribution. So, it will be readily available as soon as install Hbase and no additional tool config is required. It has many different options and commands. For brevity, i am listing the ones used mostly.

a) To quickly populate data(writes/load data), make use of the below statement:

hbase pe –table=”testtable3″ –rows=10000 sequentialWrite 10

Above command creates “testtable3” with default number of columns(=1) , uses 10 threads. Some other helpful options/commands in PE and their default values:

Options:

–nomapred Run multiple clients using threads (rather than use mapreduce)
–rows Rows each client runs. Default: One million

Commands:

randomRead Run random read test
randomSeekScan Run random seek and scan 100 test
randomWrite Run random write test
scan Run scan test (read every row)
scanRange10 Run random seek scan with both start and stop row (max 10 rows)
scanRange100 Run random seek scan with both start and stop row (max 100 rows)
sequentialWrite Run sequential write test

In the above command, “pe” is just a synonym for the object: org.apache.hadoop.hbase.PerformanceEvaluation On completion, it will also provide useful metrics on the Write performance:

HBase Performance Evaluation

Elapsed time in milliseconds=125040

Row count=100000

RandomRead and RandomWrite tests

One of the reasons why developers/architects decide upon using HBase/Mapr-DB is the need to do both RandomReads and RandomWrites based on a Row-key. So, once the data ingestion is done, you can do a RandomRead test with a set of concurrent connections to the db.

For example –

hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 20

where 20 is the concurrent number of connections doing the updates on K,and V. Similarly, benchmarking for random updates can be done this way:

hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 20

Above command, by default will run in MapReduce mode and emulates 20 client threads, with each thread inserting 1 million rows. The row size is 1024.

LoadTestTool(ltt)

This is another in-built tool synonymous with the class: “org.apache.hadoop.hbase.util.LoadTestTool”. It can be used for performing writes, updates, or reads. For ex ample

hbase org.apache.hadoop.hbase.util.LoadTestTool -write 3:1024 -tn testtable -num_keys 10000000

The above command writes a table with name “testtable”,10M rows, 3 columns and an average row size of 1024 bytes.

The hbase ltt command runs the LoadTestTool utility, which is used for testing. You must specify either -init_only or at least one of -write, -update, or -read. For general usage instructions, pass the -h option.

The LoadTestTool has received many updates in recent HBase releases, including support for namespaces, support for tags, cell-level ACLS and visibility labels, testing security-related features, ability to specify the number of regions per server, tests for multi-get RPC calls, and tests relating to replication.

Interface LoadBalancer

Makes decisions about the placement and movement of Regions across RegionServers. Cluster-wide load balancing will occur only when there are no regions in transition and according to a fixed period of a time using balanceCluster(Map). On cluster startup, bulk assignment can be used to determine locations for all Regions in a cluster. This class produces plans for the AssignmentManager to execute.

Field

static ServerName BOGUS_SERVER_NAME
static String SYSTEM_TABLES_ON_MASTER
Master carries system tables.
static String TABLES_ON_MASTER
Master can carry regions as of hbase-2.0.0.

Method

List<RegionPlan> balanceCluster(Map<ServerName,List<RegionInfo>> clusterState) – Perform the major balance operation
List<RegionPlan> balanceCluster(TableName tableName, Map<ServerName,List<RegionInfo>> clusterState) – Perform the major balance operation
void initialize() – Initialize the load balancer.
static boolean isMasterCanHostUserRegions(org.apache.hadoop.conf.Configuration conf)
static boolean isSystemTablesOnlyOnMaster(org.apache.hadoop.conf.Configuration conf)
static boolean isTablesOnMaster(org.apache.hadoop.conf.Configuration conf)
void onConfigurationChange(org.apache.hadoop.conf.Configuration conf) – This method would be called by the ConfigurationManager object when the Configuration object is reloaded from disk.
void postMasterStartupInitialize() – If balancer needs to do initialization after Master has started up, lets do that here.
ServerName randomAssignment(RegionInfo regionInfo, List<ServerName> servers) – Get a random region server from the list
void regionOffline(RegionInfo regionInfo) – Marks the region as offline at balancer.
void regionOnline(RegionInfo regionInfo, ServerName sn) – Marks the region as online at balancer.
Map<ServerName,List<RegionInfo>> retainAssignment(Map<RegionInfo,ServerName> regions, List<ServerName> servers) – Assign regions to the previously hosting region server
Map<ServerName,List<RegionInfo>> roundRobinAssignment(List<RegionInfo> regions, List<ServerName> servers) – Perform a Round Robin assignment of regions.
void setClusterLoad(Map<TableName,Map<ServerName,List<RegionInfo>>> ClusterLoad) – Pass RegionStates and allow balancer to set the current cluster load.
void setClusterMetrics(ClusterMetrics st) – Set the current cluster status.
void setMasterServices(MasterServices masterServices) – Set the master service.
void updateBalancerStatus(boolean status)

Writing to HBase

For batch loading use the bulk-load tool, if possible. For more about this technique, go here: http://hbase.apache.org/book/arch.bulk. load.html
Use pre-splitting of regions when bulk loading into empty HBase table
Consider disabling WAL or use deferred WAL flushes.
Make sure that setAutoFlush is set to false on your HTable instance when performing a lot of Puts.
Watch for and avoid the RegionServer hot-spotting problem. The sign is that one RS is sweating while others are resting, i.e. an uneven load distribution while writing (hot-spotting is also mentioned in the Row-Key Design section).

Reading from HBase

Use bigger-than-default (1) scan caching when reading a lot of records, e.g., when using the HBase table as a source for a MapReduce job. But beware that overly aggressive caching may cause timeouts—e.g., the UnknownScannerException—if processing of records is heavy/slow
Narrow down Scan selection by defining column families and columns you need to fetch.
Close ResultScanners in code.
Set CacheBlocks to “false” in Scan, a source for the MapReduce job.

HBase Fault Tolerance

High Availability, or HA benefits HBase applications that require low-latency queries and can tolerate minimal (near-zero-second) staleness for read operations. Examples include queries on remote sensor data, distributed messaging, object stores, and user profile management.

High Availability for HBase features the following functionality:

Data is safely protected in HDFS
Failed nodes are automatically recovered
No single point of failure
All HBase API and region operations are supported, including scans, region split/merge, and META table support (the META table stores information about regions)

However, HBase administrators should carefully consider the following costs associated with using High Availability features:

Double or triple MemStore usage
Increased BlockCache usage
Increased network traffic for log replication
Extra backup RPCs for secondary region replicas

HBase is a distributed key-value store designed for fast table scans and read operations at petabyte scale. Before configuring HA for HBase, you should understand the concepts in the following table.

HBase Concept	Description
Region	A group of contiguous rows in an HBase table. Tables start with one region; additional regions are added dynamically as the table grows. Regions can be spread across multiple hosts to balance workloads and recover quickly from failure. There are two types of regions: primary and secondary. A secondary region is a copy of a primary region, replicated on a different RegionServer.
RegionServer	A RegionServer serves data requests for one or more regions. A single region is serviced by only one RegionServer, but a RegionServer may serve multiple regions. When region replication is enabled, a RegionServer can serve regions in primary and secondary mode concurrently.
Column family	A column family is a group of semantically related columns that are stored together.
Memstore	Memstore is in-memory storage for a RegionServer. RegionServers write files to HDFS after the MemStore reaches a configurable maximum value specified with the hbase.hregion.memstore.flush.size property in the hbase-site.xml configuration file.
Write Ahead Log (WAL)	The WAL is a log file that records all changes to data until the data is successfully written to disk (MemStore is flushed). This protects against data loss in the event of a failure before MemStore contents are written to disk.
Compaction	When operations stored in the MemStore are flushed to disk, HBase consolidates and merges many smaller files into fewer large files. This consolidation is called compaction, and it is usually very fast. However, if many RegionServers hit the data limit (specified by the MemStore) at the same time, HBase performance may degrade from the large number of simultaneous major compactions. Administrators can avoid this by manually splitting tables over time.

HBase High Availability

HBase, architecturally, has had a strong consistency guarantee from the start. All reads and writes are routed through a single RegionServer, which guarantees that all writes happen in order, and all reads access the most recently committed data.

However, because of this “single homing” of reads to a single location, if the server becomes unavailable, the regions of the table that are hosted in the RegionServer become unavailable for some time until they are recovered. There are three phases in the region recovery process: detection, assignment, and recovery. Of these, the detection phase is usually the longest, currently on the order of 20 to 30 seconds depending on the ZooKeeper session timeout setting (if the RegionServer became unavailable but the ZooKeeper session is alive). After that we recover data from the Write Ahead Log and assign the region to a different server. During this time — until the recovery is complete — clients are not able to read data from that region.

For some use cases the data may be read-only, or reading some amount of stale data is acceptable. With timeline-consistent highly available reads, HBase can be used for these kind of latency-sensitive use cases where the application can expect to have a time bound on the read completion.

For achieving high availability for reads, HBase provides a feature called “region replication”. In this model, for each region of a table, there can be multiple replicas that are opened in different RegionServers. By default, the region replication is set to 1, so only a single region replica is deployed and there are no changes from the original model. If region replication is set to 2 or more, then the master assigns replicas of the regions of the table. The Load Balancer ensures that the region replicas are not co-hosted in the same Region Servers and also in the same rack (if possible).

All of the replicas for a single region have a unique replica ID, starting with 0. The region replica with replica ID = 0 is called the “primary region.” The others are called “secondary region replicas,” or “secondaries”. Only the primary region can accept writes from the client, and the primary always contains the latest changes. Since all writes must go through the primary region, the writes are not highly available (meaning they might be blocked for some time if the region becomes unavailable).

Timeline and Strong Data Consistency

HBase guarantees timeline consistency for all data served from RegionServers in secondary mode, meaning all HBase clients see the same data in the same order, but that data may be slightly stale. Only the primary RegionServer is guaranteed to have the latest data. Timeline consistency simplifies the programming logic for complex HBase queries and provides lower latency than quorum-based consistency.

In contrast, strong data consistency means that the latest data is always served. However, strong data consistency can greatly increase latency in case of a RegionServer failure, because only the primary RegionServer is guaranteed to have the latest data. The HBase API allows application developers to specify which data consistency is required for a query.