Demystifying the CAP theorem – A Beginner's Guide

Demystifying the CAP theorem – A Beginner's Guide

Navigating Consistency, Availability, and Partition Tolerance in Distributed Systems

Introduction

Distributed systems are the backbone of the infrastructure of modern applications. These intricate networks of computers scattered across the globe power most of the services we use daily.

You may ask, "So what exactly is the CAP theorem and what does it have to do with distributed systems?". This article will answer your questions. You'll learn about distributed systems, the CAP theorem, and its core principles. Also, you'll understand the real-world applications and limitations of the CAP theorem.

Prerequisite

This article is suitable for you if you have basic knowledge of computers. Even if you don't, you can still follow along as this article is void of intricate technical details

What is a distributed system?

A distributed system is a collection of separate computers connected over a network. These computers work together to function as a unified entity. They interact and share data so they can maintain a shared state.

Users communicate with different computers in the system and assume it to be a single unit. This is because distributed systems store data across various nodes using various computers. This happens concurrently in the system.

What is the CAP theorem?

The three letters "CAP" stand for Consistency, Availability, and Partition tolerance. They are the core principles of the CAP theorem. These three principles represent the properties of an ideal distributed system.

The CAP theorem is fundamental in computer science. It shines light on the delicate balance between consistency, availability, and partition tolerance in distributed systems.

It describes the behavior of distributed systems during a partition in the network. The CAP theorem helps engineers and architects make informed decisions and trade-offs when building distributed systems.

A brief history

Eric Brewer, also known as the father of the CAP theorem published the CAP theorem in 1999. He presented the CAP theorem as a conjecture at an academic conference in 2000. In 2002, two MIT professors Seth Gilbert and Nancy Lynch published a formal proof of the CAP theorem.

The CAP theorem was often explained with the oversimplified "2 out 3" notion. This notion suggests that you can only have any two from consistency, availability, and partition tolerance in a distributed system. This notion proved to be not only incorrect but misleading. Eric Brewer had to clarify that actually, the trade-off is between availability and consistency. This is because distributed systems are required to be partition tolerant since partitions are almost inevitable.

The main concepts of the CAP theorem

The main concepts of the CAP theorem are consistency, availability, and partition tolerance. You may ask "...but what do they even mean?". Let's see

  1. Consistency: consistency in the CAP theorem means the components stay in sync. It ensures that when there is an update in the system, every client using the system sees the same data. Clients get to see the same data even if they are using different parts of the system. A consistent system will synchronize updates to every node before any external operation. This way every client using the system gets the most up-to-date information.

  2. Availability: means the system is up and running so you can use it anytime. This type of system always responds to requests from clients even if some nodes are faulty. It might not respond with the most up-to-date information but it's always operational. Unlike a consistent system, it prioritizes external operations from clients to data synchronization.

  3. Partition tolerance: means the system thrives when there is partition in the network. It focuses on how a distributed system continues to function in case of any setback in the network. Setbacks include dropped partitions and slow or unavailable network connections among the nodes. Without partition tolerance, a network breach in the system could bring it down.

A distributed system should be able to provide both consistency and availability. However, partitions are inevitable and you have to choose between consistency and availability. This is the main point of the CAP theorem.

CAP theorem in real life

There are various real-life cases where distributed systems make trade-offs based on the CAP theorem. Let’s see some examples below

Large e-commerce brands have shopping cart systems that prioritize availability over consistency. Let's take Amazon for example. Sometimes, you can add an item to your cart on Amazon's website but won't see it on your mobile app. This happens because your data is yet to be synchronized to all servers in Amazon's system. This temporary inconsistency takes some time and corrects itself. Users are usually comfortable with this inconsistency so long as they can use the system.

Another example will be Financial institutions. These institutions build their foundations on the trust of their customers. So, they need systems that extend transparency and consistency.

A stock trading platform, for example, needs strong consistency in its systems. Users need to see their actual balance immediately after they buy or sell a stock. Imagine, you buy a stock and your it reflects in your balance some minutes later — ridiculous right? Yeah, that's why they prioritize consistency over availability

In reality, most distributed systems aim for a balance between availability and consistency. It's not always an "all or nothing" case. Engineers or architects can configure systems to lean more towards availability or consistency. They do so depending on the specific needs of the application. This way they can achieve a balance that meets the requirements of the application.

Limitations of the CAP theorem

Although the CAP theorem helps one to understand distributed systems, it has limitations. There are several valid criticisms and constraints of the CAP theorem. Let's look at some of them:

  1. Oversimplification: the CAP theorem oversimplifies the complexities of distributed systems. It assumes that major decisions revolve around only consistency, availability, and partition tolerance. It sacrifices several important technical details and trade-offs of distributed systems for simplicity. The extreme simplicity of the CAP theorem often leads to incorrect conclusions.

  2. Exaggerated cases: the CAP theorem focuses on extreme cases. Extreme cases like complete network partitions are quite rare in modern distributed systems. In reality, many partitions are partial partitions and varying degrees of network instability.

  3. Latency: can have significant effects on distributed systems. Yet, the CAP theorem ignores latency completely. It doesn't address how varying levels of availability and consistency can cause latency.

  4. Load: heavy loads can have significant effects on distributed systems. The CAP theorem completely ignores such effects. The reality is that even sophisticated systems suffer when they process heavy loads.

  5. Byzantine faults: the CAP theorem doesn’t address how to mitigate Byzantine faults. It fails to address cases where malicious nodes in a distributed system may attack the system. This is an important consideration in decentralized distributed systems like blockchains.

  6. Theoretical model: several modern techniques mitigate trade-offs described by the CAP theorem. Techniques such as replication, sharding, and load balancing mitigate the trade-offs effectively. These techniques introduce nuanced behaviors into distributed systems. These new behaviors go beyond the CAP theorem and relegate it to a mere theoretical model.

  7. Lack of formal proof: there is no formal mathematical proof for the CAP theorem. This often leads to several nuanced interpretations of the theorem. This is often a source of confusion among engineers, architects, and scientists.

  8. Evolution of technology: technology has evolved since the proposition of the CAP theorem. Currently, we use several tools in building very sophisticated and resilient distributed systems. Tools like raft consensus algorithms and advanced caching techniques challenge the CAP theorem. These tools have become so much better that they render the CAP theorem almost obsolete.

Conclusion

In summary, you have learned about distributed systems and the CAP theorem. You now understand the type of trade-offs made when designing distributed systems. The applications of the CAP theorem in real-life systems aren't alien to you anymore. Also, you have become exposed to some of the limitations and constraints of the CAP theorem. This is only the tip of the iceberg about distributed systems, there is a lot more out there in the wild.