Classes

Package org.apache.hadoop.hbase.mapreduce – Provides HBase MapReduce Input/OutputFormats, a table indexing MapReduce job, and utility methods.

Interface

VisibilityExpressionResolver – Interface to convert visibility expressions into Tags for storing along with Cells in HFiles.

Class

CellCounter – A job with a a map and reduce phase to count cells in a table.
CellCreator – Facade to create Cells for HFileOutputFormat.
CellSerialization
CellSortReducer – Emits sorted Cells.
CopyTable – Tool used to copy a table to another one which can be on a different setup.
Export – Export an HBase table.
GroupingTableMapper – Extract grouping columns from input record.
HFileOutputFormat2 – Writes HFiles.
HRegionPartitioner<KEY,VALUE> – This is used to partition the output keys into groups of keys.
IdentityTableMapper – Pass the given key and record as-is to the reduce phase.
IdentityTableReducer – Convenience class that simply writes all values (which must be Put or Delete instances) passed to it out to the configured HBase table.
Import – Import data written by Export.
ImportTsv – Tool to import data from a TSV file.
LoadIncrementalHFiles
Deprecated
As of release 2.0.0, this will be removed in HBase 3.0.0.
LoadQueueItem
Deprecated
As of release 2.0.0, this will be removed in HBase 3.0.0.
MultiTableHFileOutputFormat – Create 3 level tree directory, first level is using table name as parent directory and then use family name as child directory, and all related HFiles for one family are under child directory -tableName1 -columnFamilyName1 -columnFamilyName2 -HFiles -tableName2 -columnFamilyName1 -HFiles -columnFamilyName2
MultiTableInputFormat – Convert HBase tabular data from multiple scanners into a format that is consumable by Map/Reduce.
MultiTableInputFormatBase – A base for MultiTableInputFormats.
MultiTableOutputFormat – Hadoop output format that writes to one or more HBase tables.
MultiTableSnapshotInputFormat – MultiTableSnapshotInputFormat generalizes TableSnapshotInputFormat allowing a MapReduce job to run over one or more table snapshots, with one or more scans configured for each.
MutationSerialization
PutCombiner<K> – Combine Puts.
PutSortReducer – Emits sorted Puts.
ResultSerialization
RowCounter – A job with a just a map phase to count rows.
SimpleTotalOrderPartitioner<VALUE> – A partitioner that takes start and end keys and uses bigdecimal to figure which reduce a key belongs to.
TableInputFormat – Convert HBase tabular data into a format that is consumable by Map/Reduce.
TableInputFormatBase – A base for TableInputFormats.
TableMapper<KEYOUT,VALUEOUT> – Extends the base Mapper class to add the required input key and value classes.
TableMapReduceUtil – Utility for TableMapper and TableReducer
TableOutputCommitter – Small committer class that does not do anything.
TableOutputFormat<KEY> – Convert Map/Reduce output and write it to an HBase table.
TableRecordReader – Iterate over an HBase table data, return (ImmutableBytesWritable, Result) pairs.
TableRecordReaderImpl – Iterate over an HBase table data, return (ImmutableBytesWritable, Result) pairs.
TableReducer<KEYIN,VALUEIN,KEYOUT> – Extends the basic Reducer class to add the required key and value input/output classes.
TableSnapshotInputFormat – TableSnapshotInputFormat allows a MapReduce job to run over a table snapshot.
TableSplit – A table split corresponds to a key range (low, high) and an optional scanner.
TextSortReducer – Emits Sorted KeyValues.
TsvImporterMapper – Write table content out to files in hdfs.
TsvImporterTextMapper – Write table content out to map output files.
WALInputFormat – Simple InputFormat for WAL files.
WALPlayer – A tool to replay WAL files as a M/R job.

Job Configuration

The following is an example of using HBase as a MapReduce source in a read-only manner:

Configuration config = HBaseConfiguration.create();

config.set( // speculative

“mapred.map.tasks.speculative.execution”, // execution will

“false”); // decrease performance

// or damage the data

Job job = new Job(config, “ExampleRead”);

job.setJarByClass(MyReadJob.class); // class that contains mapper

Scan scan = new Scan();

scan.setCaching(500); // 1 is the default in Scan,

// which will be bad for MapReduce jobs

scan.setCacheBlocks(false); // don’t set to true for MR jobs

// set other scan attrs

…

TableMapReduceUtil.initTableMapperJob(

tableName, // input HBase table name

scan, // Scan instance to control CF and attribute selection

MyMapper.class, // mapper

null, // mapper output key

null, // mapper output value

job);

job.setOutputFormatClass(NullOutputFormat.class); // because we

// aren’t emitting anything from mapper

boolean b = job.waitForCompletion(true);

if (!b) {

throw new IOException(“error with job!”);

}

The mapper instance would extend TableMapper, too, like this:

public static class MyMapper extends TableMapper<Text, Text> {

public void map(ImmutableBytesWritable row, Result value, Context context)

throws InterruptedException, IOException {

// process data for the row from the Result instance.

}

Map Tasks Number

When TableInputFormat is used (set by default with TableMapReduceUtil. initTableMapperJob(…)) to read an HBase table for input to a MapReduce job, its splitter will make a map task for each region of the table. Thus, if 100 regions are in the table, there will be 100 map tasks for the job, regardless of how many column families are selected in the Scan. To implement a different behavior (custom splitters), see the method getSplits in TableInputFormatBase (either override in custom-splitter class or use as example).

Writing to HBase