Compaction

Certify and Increase Opportunity.
Be
Govt. Certified Apache Cassandra Professional

Compaction in Apache Cassandra

Compaction

Cassandra’s log-structured storage engine enables Cassandra’s amazing performance and features like application-transparent compression by turning all updates into data files called sstables that are written sequentially to disk. No update-in-place is done (because that would require doing random i/o); instead, a new version of the columns being inserted or updated is written in the next sstable.

Thus, over time, many versions of a row may exist in different sstables. Each of these versions may have different sets of columns. If the sstables were allowed to accumulate, reading a row could require many seeks in different files to return a result.

To prevent read speed from deteriorating, compaction runs in the background, merging sstables together. (This can also be done performantly, without random i/o, because rows are sorted by primary key within in each sstable.)
Tiered Compaction

Cassandra’s size-tiered compaction stragety is very similar to the one described in Google’s Bigtable paper: when enough similar-sized sstables are present (four by default), Cassandra will merge them.

There are three problems with size-tiered compaction in update-heavy workloads:

Performance can be inconsistent because there are no guarantees as to how many sstables a row may be spread across: in the worst case, we could have columns from a given row in each sstable.
A substantial amount of space can be wasted since there is no guarantee as to how quickly obsolete columns will be merged out of existance; this is particularly noticeable when there is a high ratio of deletes.
Space can also be a problem as sstables grow larger from repeated compactions, since an obsolete sstable cannot be removed until the merged sstable is completely written.  In the worst case of a single set of large sstable with no obsolete rows to remove, Cassandra would need 100% as much free space as is used by the sstables being compacted, into which to write the merged one.

Leveled Compaction
Cassandra 1.0 introduces the Leveled Compaction Strategy, based on LevelDB from the Chromium team at Google.

Leveled compaction creates sstables of a fixed, relatively small size (5MB by default in Cassandra’s implementation), that are grouped into “levels.” Within each level, sstables are guaranteed to be non-overlapping. Each level is ten times as large as the previous.

This solves the above problems with tiered compaction:

Leveled compaction guarantees that 90% of all reads will be satisfied from a single sstable (assuming nearly-uniform row size). Worst case is bounded at the total number of levels — e.g., 7 for 10TB of data.
At most 10% of space will be wasted by obsolete rows.
Only enough space for 10x the sstable size needs to be reserved for temporary use by compaction.

Leveled compaction can be enabled by creating (or updating) a ColumnFamily with the compaction_strategy option set to LeveledCompactionStrategy. When updating an existing column family, reads and writes can continue as usual while leveling of existing sstables is performed in the background.
Considerations

Because leveled compaction makes the above guarantees, it does roughly twice as much i/o as size-tiered compaction. For primarily insert-oriented workloads, this extra i/o does not pay off in terms of the above benefits, since there are few obsolete row versions involved.

Leveled compaction ignores the concurrent_compactors setting. Concurrent compaction is designed to avoid tiered compaction’s problem of a backlog of small compaction sets becoming blocked temporarily while the compaction system is busy with a large set. Leveled compaction does not have this problem, since all compaction sets are roughly the same size. Leveled compaction does honor the multithreaded_compaction setting, which allows using one thread per sstable to speed up compaction. However, most compaction tuning will still involve using compaction_throughput_mb_per_sec (default: 16) to throttle compaction back.

Durability with sstables, memtables, commit logs and hinted handoff
Bloom filters and tombstones

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?