Consensus is one of the most fundamental problems in distributed systems. These papers cover both theoretical foundations and practical implementations used in production systems.
Overview
Distributed consensus allows multiple nodes to agree on a single value or sequence of values, even in the presence of failures. This section covers the most influential papers on consensus algorithms and fault tolerance.Byzantine Fault Tolerance
Byzantine failures represent the most challenging type of failure in distributed systems, where nodes can behave arbitrarily or even maliciously.The Byzantine Generals Problem
Lamport’s classic formulation of the Byzantine agreement problem
Practical Byzantine Fault Tolerance
A practical algorithm for Byzantine fault tolerance in asynchronous systems
The Byzantine Generals Problem
The Byzantine Generals Problem This paper presents the Byzantine Generals Problem, a metaphor for reaching consensus in distributed systems where components may fail in arbitrary ways. The problem illustrates the challenges of achieving agreement when some participants may be unreliable or malicious.Practical Byzantine Fault Tolerance (PBFT)
Practical Byzantine Fault Tolerance PBFT provides a practical algorithm for state machine replication that tolerates Byzantine faults. It works in asynchronous environments and has been influential in both academic research and practical systems, particularly in blockchain technologies.Impossibility Results
Impossibility of Distributed Consensus with One Faulty Process Also known as the FLP Impossibility Result, this paper proves that no deterministic consensus algorithm can guarantee termination in an asynchronous system with even one faulty process. Understanding this impossibility is crucial for appreciating the compromises made by practical consensus algorithms.Paxos Algorithm
Paxos is one of the most well-known consensus algorithms, though it has a reputation for being difficult to understand.The Part-Time Parliament
Lamport’s original Paxos paper (challenging read, may require multiple passes)
Paxos Made Simple
A more concise and readable version by Lamport himself
The Part-Time Parliament
The Part-Time Parliament The paper presents Paxos through an allegory about a fictional legislative process on the Greek island of Paxos. While creative, this presentation makes the algorithm challenging to extract and understand.Paxos Made Simple
Paxos Made Simple Lamport’s own attempt to make Paxos more accessible. This paper is shorter and more straightforward than the original, presenting the algorithm more directly without the allegory.Key Paxos Concepts
Key Paxos Concepts
Paxos solves consensus through three roles:
- Proposers: Propose values
- Acceptors: Vote on proposed values
- Learners: Learn the chosen value
- Prepare phase: Proposer selects a proposal number and sends prepare requests
- Accept phase: If enough acceptors respond, proposer sends accept requests
- Learn phase: Once a value is accepted by a majority, learners are informed
Paxos in Practice
Chubby Lock Service
The Chubby Lock Service for Loosely Coupled Distributed Systems Google’s lock service for loosely coupled distributed systems. Chubby is essentially “Paxos as a Service” - it provides a reliable foundation for building other distributed systems. It has been the primary inspiration for other service discovery and coordination tools like ZooKeeper, etcd, and Consul.Why Chubby Matters
Why Chubby Matters
Chubby demonstrates how to build practical services on top of Paxos:
- Provides distributed locking with advisory locks
- Serves as a name service for other systems
- Offers a simple file-system-like interface
- Shows how to scale Paxos to production workloads
- Influenced the design of ZooKeeper, etcd, and Consul
Paxos Made Live
Paxos Made Live - An Engineering Perspective This paper documents Google’s experience implementing systems on top of Paxos. It demonstrates the gap between theoretical algorithms and production systems, covering practical issues encountered when implementing Paxos in real-world scenarios.This is essential reading for anyone planning to implement consensus algorithms. It covers challenges like:
- Disk corruption and recovery
- Master leases and group membership
- Snapshots and log compaction
- Testing and debugging distributed consensus
Raft Consensus Algorithm
Raft Consensus Algorithm | Interactive Visualization Raft is an alternative to Paxos designed explicitly for understandability. It achieves consensus through a more straightforward approach that’s easier to reason about and implement.The Secret Lives of Data: Raft
An excellent interactive visualization showing how Raft works step-by-step
Why Raft Was Created
Why Raft Was Created
Raft was designed as a more understandable alternative to Paxos:
- Understandability: Explicitly designed to be easier to understand than Paxos
- Leader-based: Uses strong leadership to simplify the protocol
- Decomposed: Separates leader election, log replication, and safety
- Practical: Designed with implementation in mind from the start
Key Raft Features
Raft decomposes consensus into three relatively independent subproblems:- Leader Election: How to choose a new leader when the current one fails
- Log Replication: How the leader replicates log entries to followers
- Safety: Ensuring consistency even during failures
CRDTs: Conflict-Free Replicated Data Types
Conflict-free Replicated Data Types | Talk by Martin Kleppmann CRDTs present an approach for achieving Strong Eventual Consistency without requiring coordination. They enable replicas to be updated independently and concurrently, with all replicas eventually converging to the same state.CRDT Applications
CRDT Applications
CRDTs have been applied in many production systems:
- Riak: Distributed database using CRDTs for data types
- Redis: In-memory data store with CRDT support
- Akka: Actor framework with distributed data using CRDTs
- Collaborative editing: Google Docs, Figma, and other real-time collaboration tools
For an excellent introduction to CRDTs, watch Martin Kleppmann’s talk which explains the concepts clearly with practical examples.
CRDT Advantages
- No coordination needed: Updates can happen independently on any replica
- Eventual consistency: All replicas eventually converge to the same state
- Partition tolerance: Works naturally with network partitions
- Low latency: Local updates without waiting for coordination
Alternative Approaches
Speculative Algorithms
Speculative Algorithms for Global State Synchronization Azos.Sky.Server.Locking uses probability-based QoS (Quality of Service)/Trust measures to ensure probability-based consensus. This approach avoids distributed state machine and phase synchronization, making it very simple to understand and implement.How Speculative Consensus Works
How Speculative Consensus Works
Unlike traditional consensus algorithms, speculative approaches:
- Use probability and trust metrics instead of strict coordination
- Avoid complex phase-based protocols like Paxos or Raft
- Trade strict guarantees for simplicity and performance
- Work well when approximate consensus is acceptable
Further Reading
Turing Lecture: The Computer Science of Concurrency
Leslie Lamport’s Turing Lecture on the early years of concurrency research
There is No Now
Understanding the problems with simultaneity in distributed systems