Host proximity by snitch

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Host proximity by snitch

Cassandra uses two items, snitches and strategies, to determine which nodes will receive copies of data. Snitches define proximity of nodes within the ring. Strategies use the information snitches provide them about node proximity along with an implemented algorithm to collect nodes that will receive writes.

There are two types snitches – SimpleSnitch & SimpleStrategy

SimpleSnitch literally has no locality information about nodes, it just returns a list of all the nodes in a ring. SimpleStrategy will attempt to start writing data to the first node whose token is larger than the tokens data. If there are no nodes whose token is larger than the data’s token, it will start at the node with the smallest token.

Lets say we have four nodes in our Cassandra ring with a token range of 0-100 and our intial tokens are assigned as follows: d->0, a->25, b->50, c->75. If we try to place data that has a token of 19, SimpleStrategy will ask for the list of nodes from SimpleSnitch, then it will write to the first node whose token is larger, which in this case is node a. If we had a replication factor of 2 (two copies of data should be written), SimpleStrategy will simply continue gathering the next highest token value node. So our second copy would go to node b.

This is one of the most common types of distribution methods that people implement with Cassandra: even token distribution between nodes so that each owns 25% of the data.

Smarter Snitches and Strategies
Cassandra has another Snitch called PropertyFileSnitch which maintains much more information about nodes within the ring. PropertyFileSnitch maintains a mapping of node, datacenter, and rack so that we can determine, for any node, what data center it is in, and what rack within that datacenter it is in. This information is statically defined in cassandra-topology.properties.

There is also a Strategy that is made to use the information from a PropertyFileSnitch called NetworkTopologyStrategy (NTS). The NTS algorithm is implemented as follows:

  • Get Datacenters from strategy options: {DC0:1,DC1:1}
  • For each data center entry
  1. Get replication factor
  2. Get a list of all endpoints for this datacenter from the snitch
  3. Create a ringIterator from the datacenter endpoints list and Collect endpoints to write to – only select an endpoint from the list for any given rack once (distribute across racks)
  4. If replication factor has not been met, continue to collect endpoints from the list, allowing racks that already contain an endpoint in the write list
  • If our replication factor is not equal to our list of endpoints, throw an error because there are not enough nodes in the data center to meet the replication factor

There is a lot of important stuff going on here (see the presentation slides for more in depth coverage of what is going on internally), but to keep it brief, the key difference is that instead of iterating over an entire set of nodes in the ring, NTS creates an iterator for EACH datacenter and places writes discretely for each. The result is that NTS basically breaks each datacenter into it’s own logical ring when it places writes.

Share this post
[social_warfare]
Introduction of Apache Cassandra
Creating a cluster and nodes

Get industry recognized certification – Contact us

keyboard_arrow_up