Apache Cassandra Interview Questions

Apache Cassandra

Cassandra is NoSQL database management system designed for handling a high volume of structured data. If you are preparing for a role in Apache Cassandra, then you will find these interview questions helpful.

Q.1 Explain the types of Data models.

There are three types of Data Model:

Conceptual Data Model
Logical Data Model
Physical Data Model

Report This Question

Q.2 What is the role of durable writes?

Durable Writes provides a means for instructing Cassandra whether to use commitlog for updates on the current KeySpace or not. However, this option is not mandatory and the default value for durable writes is TRUE.

Report This Question

Q.3 Define replication factor.

Cassandra stores copies that are known as replicas of each row based on the row key. The replication factor refers to the number of nodes that will act as copies (replicas) of each row of data.

Report This Question

Q.4 Define replication Strategy.

The replica placement strategy can be defined as how the replicas will be placed in the ring. However, there are different strategies that ship with Cassandra for determining which nodes will get copies of which keys. This include:

Simple Strategy
Network Topology Strategy

Report This Question

Q.5 What is a Simple Strategy?

This uses Simple Single Datacenter Clusters and places the first Replica on a node determined by the Partitioner. Additional Replicas are placed on the next nodes in a clockwise (in a Ring) manner without considering Rack or Datacenter location.

Report This Question

Q.6 Define Network Topology Strategy.

This is used when we want to deploy a cluster over Multiple Datacenters. It is the primary consideration for inserting replicas. This can satisfy reads locally without incurring cross Data-Center Latency and also control failure scenarios.

Report This Question

Q.7 What do you understand about a Row in Cassandra? Name its elements.

A row is a collection of sorted columns. This is the smallest unit that stores related data in Cassandra. Any component of a Row can store data or metadata. However, the elements of a row are: Row Key Column Keys Column Values

Report This Question

Q.8 What is data replication?

Data replication refers to an operation in which data from one node is copied to different nodes in the cluster. This operation makes sure to have the redundancy and fault tolerance in the database. Further, in this, the replication factor decides the number of copies, and the replication strategy decides the nodes in which the data is copied.

Report This Question

Q.9 What is a commit log?

This can be considered as a mechanism that is used for recovering data in case the database crashes. Every operation that is carried out is saved in the commit log.

Report This Question

Q.10 Define tunable consistency in Cassandra.

Tunable consistency refers to a remarkable character that makes Cassandra a mostly used database choice of Developers, Analysts, and Big data Architects. Consistency here refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s tunable consistency enables users to choose the consistency level best suited for their use cases. It supports two consistencies eventual consistency and strong consistency. Further, for strong consistency, Cassandra supports the following condition: R + W > N where, N – Number of replicas W – Number of nodes that need to agree for a successful write R – Number of nodes that need to agree for a successful read

Report This Question

Q.11 What is the process of Cassandra’s write function?

Cassandra performs the write function by applying two commits: first, it writes to a commit log on the disk, and then it commits to an in-memory structure known as memtable. And, once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTables (sorted string tables). Cassandra offers faster write performance.

Report This Question

Q.12 What is memtable?

Memtable is the in-memory/write-back cache space consisting of the content in a key and column format. The data in a memtable is sorted by key, and each column family consists of a distinct memtable that retrieves column data via the key. It stores the writes until it is full, and then flushes them out.

Report This Question

Q.13 Define Bloom Filter.

Bloom filter is linked with SStable. This is an off-heap (off the Java heap to native memory) data structure for checking whether there is any data available in the SSTable before performing any I/O disk operation.

Report This Question

Q.14 What is CAP Theorem?

With a strong requirement for scaling systems when additional resources are required, CAP Theorem plays a major role in maintaining the scaling strategy. This is an efficient way of handling scaling in distributed systems. Further, the Consistency, availability, and partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics. The two options available are AP and CP.

Report This Question

Q.15 Differentiate between a node, a cluster, and a data center in Cassandra.

A node is a single machine running Cassandra and a cluster is a collection of nodes that have similar types of data grouped together. Lastly, Data centers are useful components when serving customers in different geographical areas. However, you can group different nodes of a cluster into different data centers.

Report This Question

Q.16 What is compaction in Cassandra?

Compaction can be defined as a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk. The compaction process is useful during interacting with memtables. There are two types of compaction in Cassandra. 1. Minor compaction This begins automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one. 2. Major compaction This is triggered manually using the node tool. It compacts all SSTables of a column family into one.

Report This Question

Q.17 Define Super Column in Cassandra.

Cassandra Super Column refers to a unique element consisting of similar collections of data. They are actually key-value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: Keystore > column family > super column > column data structure in JSON. Further, super column data entries contain no independent values but are used to collect other columns.

Report This Question

Q.18 Explain what is Cassandra?

Cassandra is an open source data storage system for inbox search, developed at Facebook and it's designed for storing and managing large amounts of data across commodity servers. It can serve as both. Real time data store system for online applications, and Also for business intelligence system as a read intensive database.

Report This Question

Q.19 State the use of Cassandra and why to use Cassandra?

Cassandra was designed to handle big data workloads over the multiple nodes without any single point of failure. The various factors responsible for using Cassandra are:

It is fault tolerant and consistent
Gigabytes to petabytes scalabilities
It is a column-oriented database
No single point of failure
No need for separate caching layer
Flexible schema design
It has easy data distribution, flexible data storage, and fast writes
It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties
Multi-data center and cloud capable
Data compression.

Report This Question

Q.20 Explain what is composite type in Cassandra?

Cassandra built-in composite types come in two forms:

Static composite type: Data types for each part of a composite column are predefined per column family. All the column names/keys within a column family must be of that composite type.
Dynamic composite type: This type allows mixing column names with different composite types in a column family or even in one row.

Report This Question

Q.21 How Cassandra stores data?

All data stored as bytes
Cassandra ensures those bytes are encoded as per requirement, when you specify Validators
Then a collation orders the column based on the ordering specific to the encoding
While with a particular encoding composite are just byte arrays, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.

Report This Question

Q.22 Please mention the main components of Cassandra Data Model?

The main components of Cassandra Data Model are:
Cluster
Key space
Column
Column & Family

Report This Question

Q.23 Explain what is a column family in Cassandra?

A collection of Rows in Cassandra are referred as column family.

Report This Question

Q.24 Explain what is a cluster in Cassandra?

A cluster is a container for key spaces. Cassandra database is distributed over several machines that function together. The cluster is the outermost container which manages the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of failure of data handling.

Report This Question

Q.25 List out the other components of Cassandra?

The other components of Cassandra are

Node
Data Center
Cluster
Commit log
Mem-table
SSTable
Bloom Filter

Report This Question

Q.26 Explain what is a keyspace in Cassandra?

In Cassandra, a keyspace is a namespace determining the data replication on nodes. A cluster consist of one keyspace per node.

Report This Question

Q.27 Give the syntax to create keyspace in Cassandra?

Syntax for creating keyspace in Cassandra is
CREATE KEYSPACE WITH

Report This Question

Q.28 Mention the values that are stored in the Cassandra Column?

In Cassandra Column, basically there are three values:

Column Name
Value
Time Stamp

Report This Question

Q.29 Mention when you can use Alter keyspace?

To change properties such as the number of replicas and the durable_write of a keyspace ALTER KEYSPACE can be used.

Report This Question

Q.30 Explain what is Cassandra-Cqlsh?

Cassandra-Cqlsh is a query language that enabling the users to communicate with its database. By using Cassandra cqlsh, one can do:

Define a schema
Insert a data
Execute a query.

Report This Question

Q.31 Explain how Cassandra writes changed data into commitlog?

Cassandra concatenate changed data to commitlog
Commitlog acts as a crash recovery log for data
Until the changed data is concatenated to commitlog write operation will be never considered successful
Data will not be lost once commitlog is flushed out to file.

Report This Question

Q.32 Explain how Cassandra delete Data?

SSTables are permanent and cannot remove a row from SSTables. Cassandra assigns the column value with a special value called Tombstone when a row needs to be deleted.
Therefore, when the data is read, the Tombstone value is considered as deleted.

Report This Question

Q.33 State the usage of "void close()" method?

In Cassandra, to close the current session instance the void close() method is used.

Report This Question

Q.34 To start the cqlsh prompt state the command used?

The cqlsh command is used to initiate the cqlsh prompt.

Report This Question

Q.35 Give the usage of "cqlsh-version" command?

The "cqlsh-version" command is used to provide the version of the cqlsh one is using.

Report This Question

Q.36 Does Cassandra work on Windows?

Yes. it's is compatible with the Windows and works pretty well. Now its Linux and Window compatible version are available too.

Report This Question

Q.37 What is Kundera in Cassandra?

Kundera is an object-relational mapping (ORM) implementation, in the Cassandra which is written using Java annotations.

Report This Question

Q.38 What do you understand by Thrift in Cassandra?

Thrift is the name of RPC client which is utilized to communicate with the Cassandra Server.

Report This Question

Q.39 What is Hector in Cassandra?

Hector was one of the early Cassandra clients. It is an open source project using the MIT license written in Java.

Report This Question

Q.40 State some of the features of Apache Cassandra.

Some of the features of Apache Cassandra -
1. High Scalability
2. High fault tolerant
3. Flexible Data storage
4. Easy data distribution
5. Tunable Consistency
6. Efficient Wires
7. Cassandra Query Language

Report This Question

Q.41 How would you define NoSQL Database?

NoSQL Database is a database that deals with the non-relational database. It is also known as a Not only SQL database. NoSQL Database provides a mechanism to store and retrieve different type of data that includes images, sounds and more.

Report This Question

Q.42 What are the primary features of any NoSQL database?

Some of the primary features of any NoSQL database are -
1. Schema Agnostic
2. AutoSharding and Elasticity
3. Highly Distributable
4. Easily Scalable
5. Integrated Caching

Report This Question

Q.43 Which query language is used in Cassandra Database?

Cassandra query language' is used for Cassandra Database. Cassandra query language is an interface that a user uses to access the database and is basically a communication medium. Such that all the operations are carried out from this panel.

Report This Question

Q.44 What is the primary objective of creating Cassandra?

The primary objective of crating Cassandra is to handle a large amount of data. Also the objective ensures fault tolerance with the swift transfer of data.

Report This Question

Q.45 What do you understand by Document Store DB?

Data record is the JSON/XML representation of key-value pairs such that every record can have a different set of fields. Document DBs are similar to Key-value pairs, but the only difference is that the key is associated with a document

Report This Question

Q.46 What is the purpose of CQLSH?

Cassandra-CQLSH is a defined query language which enables users to communicate with its database. The purpose of using Cassandra CQLSH is to -
1. Define a schema
2. Insert a data
3. Execute a query

Report This Question

Q.47 How do you define is a YML file in Cassandra?

Cassandra YAML file is the main configuration file for Cassandra. Therefore after changing properties in the cassandra.yaml file, we must restart the node for the changes to take effect.

Report This Question

Q.48 Define Key-Value Store DB.

In this, all of the data inside the database consists of an indexed key and a value. A key may correspond to one or multiple values (hash table). Moreover, it provides great performance and can be very easily scaled as per business needs.

Report This Question

Q.49 Define Column Store DB.

In Column Store DB, the data is stored in cells are grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. However, one row may have one or multiple data records, which are indexed by a partition key.

Report This Question

Q.50 What do you understand about Graph DB?

Graph DB can be referred to as the type of NoSQL database in which a flexible graphical representation is used. The key motive is to store relationships between nodes.

Report This Question

Apache Cassandra

Get Govt. Certified

Are you an expert ?