Why the ‘C’ in ACID is different from the ‘C’ in CAP theorem

Nishant Tanwar 🕉️
5 min readJan 25, 2024

‘C’ in both ACID and CAP Theorem stands for Consistency but their meaning is quite different in the two contexts.

Consistency in ACID

Transactions and ACID

To ensure fault-tolerance and reliability of a system, relational databases rely on Transactions.

Transactions in simple terms is a way to group the reads and writes together in a single logical unit. This logical unit is executed as a single operation every time, what that means is at any given time either a transaction goes to completion(succeeds) i.e. is committed into the disk or it fails and is rolled back or aborted. There are no half measures here.

The above image represent a transaction, the direction of the flow is represented by the arrow.

ACID stands for Atomicity, Consistency, Isolation and Durability.

I’ll go briefly over the other three and then we will deep dive into consistency, about which this article is really about.

Atomicity refers to the guarantee that if a fault occurs during a transaction i.e. if one or more writes have been performed during a transaction and due to some reason there is an error, all of the writes have to be rolled back and the transaction is aborted. This ensures that transaction is an all or nothing operation.

Isolation refers to the guarantee that no two concurrently (at the same time) executing transactions interfere with one another. Interference here means not modifying the same records at the same time. Classic textbook definition of Isolation means that the database handles concurrent transactions as if they are running serially one after the other. This kind of isolation is the most strict of all isolation types and is called Serializability. I’ll cover isolation more deeply in a separate article.

Durability in a single-node database (the system is not distributed) refers to the guarantee that once the transaction has committed successfully the data that has been written will not be lost in case of any software or hardware failure. This means that the data is no longer stored in memory and has been successfully written to the disk. So even if the database crashes at this point or their is a hardware fault, the data can still be recovered after the crash.

Consistency in ACID is the only property in ACID that is not controlled by the database system. What it means is that, consistency here is specific to your application and is not something that a database can guarantee you. For e.g. in a banking system consistency will mean that in case of a money transfer operation whatever amount has been debited from the sender is also added to the receiver total amount so as to keep the whole system consistent. This is beyond the controls of database and is the responsibility of the programmer to make sure the debit write and credit write both are present so as to not land the system in an inconsistent state where the money has been deducted from one user but has not been credited to the receiving user.

Consistency in Brewer’s CAP Theorem

CAP stands for Consistency, Availability and Partition tolerance.

Consistency

It means that all database clients will read the same value for the same query, even given current updates. In other words every read will give back the most recent write.

In more detail consistency here deals in the context of a distributed system. In a distributed system in order to guarantee availability the general recommendation is to have multiple replicas of the database. So whenever a user writes a data it has to be replicated to all the database replicas. This kind of consistency is called Linearizability also called atomic consistency. Now due to network issues or otherwise there could be delays in replicating the write. Lets say during the replication process a client asks for the written data from a stale machine which does not yet have the latest write. In that case the consistency guarantee is said to be violated. In these kind of scenarios we refer use the term Eventual Consistency. Eventual consistency states that given enough time all the replica machines will have the latest data, but at time before that there could be stale data which may be returned in one of the reads.

The basic idea behind Linearizability is to make the system (the distributed multi replica database) appear as if there was only one copy of the data and all the operations are atomic. Maintaining the illusion of a single copy of the data means guaranteeing that the value read is the most recent and does not come from the a stale replica. This is very difficult to achieve in reality and therefore most distributed system tends to go with Eventual Consistency.

Availability

It means that all database clients will be able to read and write data.

Partition Tolerance

It means that the in case there is a network fault the database machines hosted in different datacenter continue to process reads and writes.

The theorem states that at any given time the system can only satisfy two out of the three. This is little misleading because in a distributed system partition tolerance will most likely happen and is not something we have a choice over. Faults will happen weather we like it or not. It should instead be that there is a choice of an AP system or a CP system. At any given point either we can have a Linearizable strong consistency and Partition tolerance system(CP system) or an Available and Partition tolerance system(AP system). As explained above if we have multi replica distributed system the replication across all nodes can take time, if we want to ensure high consistency we would need to compromise availability in that case as less member of active replicas would mean less machines to process read and write requests.

This is the end of the article.

References used :-

Designing Data Intensive Applications by Martin Kleppmann

Cassandra : The Definitive Guide by Jeff Carpenter

Also check out my other coding pattern articles below :

--

--