Scans Technique

The scan command is used to view the data in HTable. Using the scan command, you can get the table data. Its syntax is as follows:

scan ‘<table name>’

Using Java API

The complete program to scan the entire table data using java API is as follows.

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

public class ScanTable{

public static void main(String args[]) throws IOException{

// Instantiating Configuration class

Configuration config = HBaseConfiguration.create();

// Instantiating HTable class

HTable table = new HTable(config, “emp”);

// Instantiating the Scan class

Scan scan = new Scan();

// Scanning the required columns

scan.addColumn(Bytes.toBytes(“personal”), Bytes.toBytes(“name”));

scan.addColumn(Bytes.toBytes(“personal”), Bytes.toBytes(“city”));

// Getting the scan result

ResultScanner scanner = table.getScanner(scan);

// Reading values from scan result

for (Result result = scanner.next(); result != null; result = scanner.next())

System.out.println(“Found row : ” + result);

//closing the scanner

scanner.close();

}

}

Class Scan

Attributes

@InterfaceAudience.Public

public class Scan

extends Query

It is used to perform Scan operations. All operations are identical to Get with the exception of instantiation. Rather than specifying a single row, an optional startRow and stopRow may be defined. If rows are not specified, the Scanner will iterate over all rows.

To get all columns from all rows of a Table, create an instance with no constraints; use the Scan() constructor. To constrain the scan to specific column families, call addFamily for each family to retrieve on your Scan instance.

To get specific columns, call addColumn for each column to retrieve. To only retrieve columns within a specific range of version timestamps, call setTimeRange. To only retrieve columns with a specific timestamp, call setTimestamp. To limit the number of versions of each column to be returned, call setMaxVersions. To limit the maximum number of values returned for each call to next(), call setBatch. To add a filter, call setFilter.

For small scan, it is deprecated in 2.0.0. Now we have a setLimit(int) method in Scan object which is used to tell RS how many rows we want. If the rows return reaches the limit, the RS will close the RegionScanner automatically. And we will also fetch data when openScanner in the new implementation, this means we can also finish a scan operation in one rpc call. And we have also introduced a setReadType(ReadType) method. You can use this method to tell RS to use pread explicitly.

To explicitly disable server-side block caching for this scan, execute setCacheBlocks(boolean). Usage alters Scan instances. Internally, attributes are updated as the Scan runs and if enabled, metrics accumulate in the Scan instance. Be aware this is the case when you go to clone a Scan instance or if you go to reuse a created Scan instance; safer is create a Scan instance per usage.

Field

Fields Modifier and TypeField and Description
static booleanDEFAULT_HBASE_CLIENT_SCANNER_ASYNC_PREFETCH

 

Default value of  HBASE_CLIENT_SCANNER_ASYNC_PREFETCH.

static StringHBASE_CLIENT_SCANNER_ASYNC_PREFETCH

 

Parameter name for client scanner sync/async prefetch toggle.

static StringSCAN_ATTRIBUTES_TABLE_NAME

Constructor

Scan() – Create a Scan operation across all rows.

Scan(Get get) – Builds a scan object with the same specs as get.

Scan(Scan scan) – Creates a new instance of this class while copying all values.

Method

Modifier and Type with method and description are listed below –

Scan     addColumn(byte[] family, byte[] qualifier). Get the column from the specified family with the specified qualifier.

Scan     addFamily(byte[] family). Get all columns from the specified family.

static Scan        createScanFromCursor(Cursor cursor). Create a new Scan with a cursor.

boolean            getAllowPartialResults()

int        getBatch()

boolean            getCacheBlocks(). Get whether blocks should be cached for this Scan.

int        getCaching()

byte[][]             getFamilies()

Map<byte[],NavigableSet<byte[]>>     getFamilyMap(). Getting the familyMap

Filter    getFilter()

Map<String,Object>    getFingerprint(). Compile the table and column family (i.e.

int        getLimit()

long     getMaxResultSize()

int        getMaxResultsPerColumnFamily()

int        getMaxVersions()

Scan.ReadType           getReadType()

int        getRowOffsetPerColumnFamily(). Method for retrieving the scan’s offset per row per column family (#kvs to be skipped)

byte[]   getStartRow()

byte[]   getStopRow()

TimeRange      getTimeRange()

boolean            hasFamilies()

boolean            hasFilter()

boolean            includeStartRow()

boolean            includeStopRow()

Boolean           isAsyncPrefetch()

boolean            isGetScan()

boolean            isNeedCursorResult()

boolean            isRaw()

boolean            isReversed(), Get whether this scan is a reversed one.

boolean            isScanMetricsEnabled()

int        numFamilies()

Scan     readAllVersions(). Get all available versions.

Scan     readVersions(int versions). Get up to the specified number of versions of each column.

Scan     setACL(Map<String,org.apache.hadoop.hbase.security.access.Permission> perms)

Scan     setACL(String user, org.apache.hadoop.hbase.security.access.Permission perms)

Scan     setAllowPartialResults(boolean allowPartialResults). Setting whether the caller wants to see the partial results when server returns less-than-expected cells.

Scan     setAsyncPrefetch(boolean asyncPrefetch)

Scan     setAttribute(String name, byte[] value). Sets an attribute.

Scan     setAuthorizations(org.apache.hadoop.hbase.security.visibility.Authorizations authorizations). Sets the authorizations to be used by this Query

Scan     setBatch(int batch). Set the maximum number of cells to return for each call to next().

Scan     setCacheBlocks(boolean cacheBlocks). Set whether blocks should be cached for this Scan.

Scan     setCaching(int caching). Set the number of rows for caching that will be passed to scanners.

Scan     setColumnFamilyTimeRange(byte[] cf, long minStamp, long maxStamp). Get versions of columns only within the specified timestamp range, [minStamp, maxStamp) on a per CF bases.

Scan     setConsistency(Consistency consistency). Sets the consistency level for this operation

Scan     setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap). Setting the familyMap

Scan     setFilter(Filter filter). Apply the specified server-side filter when performing the Query.

Scan     setId(String id). This method allows you to set an identifier on an operation.

Scan     setIsolationLevel(IsolationLevel level). Set the isolation level for this query.

Scan     setLimit(int limit). Set the limit of rows for this scan.

Scan     setLoadColumnFamiliesOnDemand(boolean value). Set the value indicating whether loading CFs on demand should be allowed (cluster default is false).

Scan     setMaxResultSize(long maxResultSize). Set the maximum result size.

Scan     setMaxResultsPerColumnFamily(int limit). Set the maximum number of values to return per row per Column Family

Scan     setNeedCursorResult(boolean needCursorResult). When the server is slow or we scan a table with many deleted data or we use a sparse filter, the server will response heartbeat to prevent timeout.

Scan     setOneRowLimit(). Call this when you only want to get one row.

Scan     setPriority(int priority)

Scan     setRaw(boolean raw). Enable/disable “raw” mode for this scan.

Scan     setReadType(Scan.ReadType readType). Set the read type for this scan.

Scan     setReplicaId(int Id). Specify region replica id where Query will fetch data from.

Scan     setReversed(boolean reversed). Set whether this scan is a reversed one

Scan     setRowOffsetPerColumnFamily(int offset). Set offset for the row per Column Family.

Scan     setRowPrefixFilter(byte[] rowPrefix). Set a filter (using stopRow and startRow) so the result set only contains rows where the rowKey starts with the specified prefix.

Scan     setScanMetricsEnabled(boolean enabled). Enable collection of ScanMetrics.

Scan     setTimeRange(long minStamp, long maxStamp)

Get versions of columns only within the specified timestamp range, [minStamp, maxStamp).

Scan     setTimestamp(long timestamp). Get versions of columns with the specified timestamp.

Map<String,Object>    toMap(int maxCols). Compile the details beyond the scope of getFingerprint (row, columns, timestamps, etc.) into a Map along with the fingerprinted information.

Scan     withStartRow(byte[] startRow). Set the start row of the scan.

Scan     withStartRow(byte[] startRow, boolean inclusive). Set the start row of the scan.

Scan     withStopRow(byte[] stopRow). Set the stop row of the scan.

Scan     withStopRow(byte[] stopRow, boolean inclusive). Set the stop row of the scan.

Apply for HBase Certification

https://www.vskills.in/certification/certified-hbase-professional

Back to Tutorials

Put, get and delete method
Filters class and types (comparison, dedicated and decorating)

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?