Apache Spark Cassandra Introduction

date 28th April, 2019 |
by Prwatech |
0 Comments

Apache Spark Cassandra Introduction

Apache Spark Cassandra Introduction, Welcome to the world of Apache Spark Cassandra tutorial. In these Tutorials, one can explore an introduction to Apache Spark Cassandra, Features of Cassandra, features of Spark Cassandra and architecture and advantages of Cassandra database. Learn More advanced tutorials on Apache Spark Cassandra Introduction for beginners from India’s Leading Apache Spark Training institute which Provides Advanced Apache Spark Course for those tech enthusiasts who wanted to explore the technology from scratch to advanced level like a Pro.

We Prwatech, the Pioneers of Apache Spark Training Offering Advanced Certification Course and Apache Spark Cassandra Introduction to those who are keen to explore the technology under the World-class Training Environment.

Introduction to Apache Spark Cassandra

It is an Apache opensource NoSQL database used to store the data in the form of key value pair. It basically uses the collection mechanism to store the data.

created by facebook, now apache license. It inherits the Big Table. It is highly robust because of the masterless application. Its works on the cluster. Users can set up the number of nodes in the cluster according to the requirement.

Features of Cassandra

Highly Scalable: we can extend the hardware capability or hardware configuration according to user requirements.

Rigid architecture: Cassandra has not a single point of failure and provides an application to afford failure.

Fault tolerance: Cassandra is fault tolerance means if we have 4 clusters each have the same copy of the file. if any node goes down the rest of 3 will serve the services.

Flexible data storage: it supports structured, unstructured or semi-structured data.

Transaction support: it does not support ACID (Atomicity, Consistency, Isolation, Durability) property.

Cassandra Components

Gossip: Gossip means one node can easily interact with another node in the same cluster.

Failure Detection: for recovery of data.

Replication: Replication is done on the basis of an equal number of nodes in a more manageable way.

Commit log: In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is write to the commit log.

Mem-table: A mem-table is a memory-resident data structure. After the commit log, the data will be write to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables.

STable: It is a disk file to which the data is flush from the mem-table when its contents reach a threshold value.

Bloom filter: These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are access after every query.

Advantages of Cassandra database

Cassandra was design to handle large amounts of data across the multiple nodes in a given cluster without a single point of failure. It has a peer-to-peer distributed system across its node in the cluster.

In Cassandra, each is independent of each other and connected to each other. The node inside the cluster plays the same role.

Every node in the cluster performs read/write operation.

In Case of failure of node other nodes can be serve from the network.

Architecture of Cassandra Database

Working :

In Cassandra Architecture we have basically two operations are :

Write operation :

Every writes activity that was captured on the node is done on commit log later the data will reach the mem-table. when data inside the mem-table will reach the threshold values it sends to the SSTable in the disk. Cassandra will automatically be partitioned and replicated throughout the cluster. After every query, Cassandra consolidated the SSTable and remove unnecessary data.

Read operation:

Read operation, Cassandra will get the value from the mem-table and perform the bloom filtering on the SSTable which contains the data.