Scans Technique

The scan command is used to view the data in HTable. Using the scan command, you can get the table data. Its syntax is as follows:

scan ‘<table name>’

Using Java API

The complete program to scan the entire table data using java API is as follows.

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

public class ScanTable{

public static void main(String args[]) throws IOException{

// Instantiating Configuration class

Configuration config = HBaseConfiguration.create();

// Instantiating HTable class

HTable table = new HTable(config, “emp”);

// Instantiating the Scan class

Scan scan = new Scan();

// Scanning the required columns

scan.addColumn(Bytes.toBytes(“personal”), Bytes.toBytes(“name”));

scan.addColumn(Bytes.toBytes(“personal”), Bytes.toBytes(“city”));

// Getting the scan result

ResultScanner scanner = table.getScanner(scan);

// Reading values from scan result

for (Result result = scanner.next(); result != null; result = scanner.next())

System.out.println(“Found row : ” + result);

//closing the scanner

scanner.close();

}

Class Scan

Attributes

@InterfaceAudience.Public

public class Scan

extends Query

It is used to perform Scan operations. All operations are identical to Get with the exception of instantiation. Rather than specifying a single row, an optional startRow and stopRow may be defined. If rows are not specified, the Scanner will iterate over all rows.

To get all columns from all rows of a Table, create an instance with no constraints; use the Scan() constructor. To constrain the scan to specific column families, call addFamily for each family to retrieve on your Scan instance.

To get specific columns, call addColumn for each column to retrieve. To only retrieve columns within a specific range of version timestamps, call setTimeRange. To only retrieve columns with a specific timestamp, call setTimestamp. To limit the number of versions of each column to be returned, call setMaxVersions. To limit the maximum number of values returned for each call to next(), call setBatch. To add a filter, call setFilter.

For small scan, it is deprecated in 2.0.0. Now we have a setLimit(int) method in Scan object which is used to tell RS how many rows we want. If the rows return reaches the limit, the RS will close the RegionScanner automatically. And we will also fetch data when openScanner in the new implementation, this means we can also finish a scan operation in one rpc call. And we have also introduced a setReadType(ReadType) method. You can use this method to tell RS to use pread explicitly.

To explicitly disable server-side block caching for this scan, execute setCacheBlocks(boolean). Usage alters Scan instances. Internally, attributes are updated as the Scan runs and if enabled, metrics accumulate in the Scan instance. Be aware this is the case when you go to clone a Scan instance or if you go to reuse a created Scan instance; safer is create a Scan instance per usage.

Field

Fields Modifier and Type	Field and Description
static boolean	DEFAULT_HBASE_CLIENT_SCANNER_ASYNC_PREFETCH Default value of HBASE_CLIENT_SCANNER_ASYNC_PREFETCH.
static String	HBASE_CLIENT_SCANNER_ASYNC_PREFETCH Parameter name for client scanner sync/async prefetch toggle.
static String	SCAN_ATTRIBUTES_TABLE_NAME

Constructor

Scan() – Create a Scan operation across all rows.

Scan(Get get) – Builds a scan object with the same specs as get.

Scan(Scan scan) – Creates a new instance of this class while copying all values.

Method

Modifier and Type with method and description are listed below –

Scan addColumn(byte[] family, byte[] qualifier). Get the column from the specified family with the specified qualifier.

Scan addFamily(byte[] family). Get all columns from the specified family.

static Scan createScanFromCursor(Cursor cursor). Create a new Scan with a cursor.

boolean getAllowPartialResults()

int getBatch()

boolean getCacheBlocks(). Get whether blocks should be cached for this Scan.

int getCaching()

byte[][] getFamilies()

Map<byte[],NavigableSet<byte[]>> getFamilyMap(). Getting the familyMap

Filter getFilter()

Map<String,Object> getFingerprint(). Compile the table and column family (i.e.

int getLimit()

long getMaxResultSize()

int getMaxResultsPerColumnFamily()

int getMaxVersions()

Scan.ReadType getReadType()

int getRowOffsetPerColumnFamily(). Method for retrieving the scan’s offset per row per column family (#kvs to be skipped)

byte[] getStartRow()

byte[] getStopRow()

TimeRange getTimeRange()

boolean hasFamilies()

boolean hasFilter()

boolean includeStartRow()

boolean includeStopRow()

Boolean isAsyncPrefetch()

boolean isGetScan()

boolean isNeedCursorResult()

boolean isRaw()

boolean isReversed(), Get whether this scan is a reversed one.

boolean isScanMetricsEnabled()

int numFamilies()

Scan readAllVersions(). Get all available versions.

Scan readVersions(int versions). Get up to the specified number of versions of each column.

Scan setACL(Map<String,org.apache.hadoop.hbase.security.access.Permission> perms)

Scan setACL(String user, org.apache.hadoop.hbase.security.access.Permission perms)

Scan setAllowPartialResults(boolean allowPartialResults). Setting whether the caller wants to see the partial results when server returns less-than-expected cells.

Scan setAsyncPrefetch(boolean asyncPrefetch)

Scan setAttribute(String name, byte[] value). Sets an attribute.

Scan setAuthorizations(org.apache.hadoop.hbase.security.visibility.Authorizations authorizations). Sets the authorizations to be used by this Query

Scan setBatch(int batch). Set the maximum number of cells to return for each call to next().

Scan setCacheBlocks(boolean cacheBlocks). Set whether blocks should be cached for this Scan.

Scan setCaching(int caching). Set the number of rows for caching that will be passed to scanners.

Scan setColumnFamilyTimeRange(byte[] cf, long minStamp, long maxStamp). Get versions of columns only within the specified timestamp range, [minStamp, maxStamp) on a per CF bases.

Scan setConsistency(Consistency consistency). Sets the consistency level for this operation

Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap). Setting the familyMap

Scan setFilter(Filter filter). Apply the specified server-side filter when performing the Query.

Scan setId(String id). This method allows you to set an identifier on an operation.

Scan setIsolationLevel(IsolationLevel level). Set the isolation level for this query.

Scan setLimit(int limit). Set the limit of rows for this scan.

Scan setLoadColumnFamiliesOnDemand(boolean value). Set the value indicating whether loading CFs on demand should be allowed (cluster default is false).

Scan setMaxResultSize(long maxResultSize). Set the maximum result size.

Scan setMaxResultsPerColumnFamily(int limit). Set the maximum number of values to return per row per Column Family

Scan setNeedCursorResult(boolean needCursorResult). When the server is slow or we scan a table with many deleted data or we use a sparse filter, the server will response heartbeat to prevent timeout.

Scan setOneRowLimit(). Call this when you only want to get one row.

Scan setPriority(int priority)

Scan setRaw(boolean raw). Enable/disable “raw” mode for this scan.

Scan setReadType(Scan.ReadType readType). Set the read type for this scan.

Scan setReplicaId(int Id). Specify region replica id where Query will fetch data from.

Scan setReversed(boolean reversed). Set whether this scan is a reversed one

Scan setRowOffsetPerColumnFamily(int offset). Set offset for the row per Column Family.

Scan setRowPrefixFilter(byte[] rowPrefix). Set a filter (using stopRow and startRow) so the result set only contains rows where the rowKey starts with the specified prefix.

Scan setScanMetricsEnabled(boolean enabled). Enable collection of ScanMetrics.

Scan setTimeRange(long minStamp, long maxStamp)

Get versions of columns only within the specified timestamp range, [minStamp, maxStamp).

Scan setTimestamp(long timestamp). Get versions of columns with the specified timestamp.

Map<String,Object> toMap(int maxCols). Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information.

Scan withStartRow(byte[] startRow). Set the start row of the scan.

Scan withStartRow(byte[] startRow, boolean inclusive). Set the start row of the scan.

Scan withStopRow(byte[] stopRow). Set the stop row of the scan.

Scan withStopRow(byte[] stopRow, boolean inclusive). Set the stop row of the scan.