Single, range and multiget slice

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Single, range and multiget slice

SLICE

A slice is a result of a query on the database. When querying a database i want to be able to ask something like.

Give me the password of the user with the login name of loginname.

When querying Cassandra a simple question like that is less simple from what you might expect.
The way forward for me is using and setting a secondary index on a column.

A secondary index is telling Cassandra to be prepared to answer simple questions like above.
Note that this feature is build in later so i expect that this is not the normal way of querying the database but for now i can create ColumnFamilies with indexed Columns.

Create an indexed Column

BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName(_keyspaceName);
columnFamilyDefinition.setName(_columnFamilyName);
columnFamilyDefinition.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());

// Create a column that can be queried by a secondary slice
BasicColumnDefinition bcd = new BasicColumnDefinition();
bcd.setName(StringSerializer.get().toByteBuffer(FIRSTNAME_COLUMN_NAME));
bcd.setIndexName(FIRSTNAME_COLUMN_NAME);
bcd.setValidationClass(“org.apache.cassandra.db.marshal.UTF8Type”);
bcd.setIndexType(ColumnIndexType.KEYS);
columnFamilyDefinition.addColumnDefinition(bcd);
// Persist or execute update on cassandra
_cluster.addColumnFamily(columnFamilyDefinition);

Slice an indexed Column

// Create the query object
IndexedSlicesQuery<String, String, byte[]> indexedSlicesQuery = HFactory.createIndexedSlicesQuery(_keyspace,
stringSerializer,stringSerializer, byteArraySerializer);

// Query column family with name
indexedSlicesQuery.setColumnFamily(_columnFamilyName);
// Set the columns you want back
indexedSlicesQuery.setColumnNames(FIRSTNAME_COLUMN_NAME, MIDDLENAME_COLUMN_NAME,LASTNAME_COLUMN_NAME);
// A mandatory equals method. Find the firstname10 in column with name firstname
indexedSlicesQuery.addEqualsExpression(FIRSTNAME_COLUMN_NAME,stringSerializer.toBytes(“firstname10”));
// Do the query
QueryResult<OrderedRows<String, String, byte[]>> result = indexedSlicesQuery.execute();

boolean found = false;
// result is ….
Iterator<Row<String, String, byte[]>> queryResultIterator = result.get().iterator();

while (queryResultIterator.hasNext()){
// fetch the result and translate the content of the column
Row<String, String, byte[]> queryRow= queryResultIterator.next();
String firstName = stringSerializer.fromBytes(
queryRow.getColumnSlice().getColumnByName(“firstName”).getValue());
Assert.assertEquals(“Search string should match query string”, firstName , “firstname10”);
found  = true;
}

Range Slice

In Cassandra, rows are hash partitioned  by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range API, you will find that you can specify only the range start, range end and an upper limit on the number of columns fetched.

However in many applications the need is to paginate through the data i.e each call should fetch a predetermined number of items.

The only option is to select a range such that that the number of items expected to return is far greater than the max limit. In this post I will discuss an adaptive range reader that I have implemented recently in agiato hosted on github. It borrows ideas from feedback control system to adaptively change the column range so that a predetermined number of items are returned.

Listed below id the slice range constructor as defined. It gets used in the Cassandra API get_slice(). As you can tell, there is no way to set the appropriate range that will ensure the return of expected number of items.

SliceRange(byte[] start, byte[] finish, boolean reversed, int count)

The only thing you can do is to set start low enough and finish high enough, and hope  that the number of columns in that range is  greater than count, resulting in count number of items returned from the query. But, there is no guarantee and nobody likes hope based logic.

Multiget Slice

With get_slice, user can fetch a set of column names for a single specified row key. But, multiget_slice can fetch a subset of columns for a set of row keys based on a column parent and a predicate. Thus by providing more than one row key, fetch the value of the named columns for each key. So a multiget slice is more than one named column for more than one row.

Share this post
[social_warfare]
Basic properties
Deletion and programmatic definition

Get industry recognized certification – Contact us

keyboard_arrow_up