Consensus Algorithms

Consensus is one of the most fundamental problems in distributed systems. These papers cover both theoretical foundations and practical implementations used in production systems.

Overview

Distributed consensus allows multiple nodes to agree on a single value or sequence of values, even in the presence of failures. This section covers the most influential papers on consensus algorithms and fault tolerance.

Byzantine Fault Tolerance

Byzantine failures represent the most challenging type of failure in distributed systems, where nodes can behave arbitrarily or even maliciously.

The Byzantine Generals Problem

Lamport’s classic formulation of the Byzantine agreement problem

Practical Byzantine Fault Tolerance

A practical algorithm for Byzantine fault tolerance in asynchronous systems

The Byzantine Generals Problem

The Byzantine Generals Problem This paper presents the Byzantine Generals Problem, a metaphor for reaching consensus in distributed systems where components may fail in arbitrary ways. The problem illustrates the challenges of achieving agreement when some participants may be unreliable or malicious.

Practical Byzantine Fault Tolerance (PBFT)

Practical Byzantine Fault Tolerance PBFT provides a practical algorithm for state machine replication that tolerates Byzantine faults. It works in asynchronous environments and has been influential in both academic research and practical systems, particularly in blockchain technologies.

Impossibility Results

Impossibility of Distributed Consensus with One Faulty Process Also known as the FLP Impossibility Result, this paper proves that no deterministic consensus algorithm can guarantee termination in an asynchronous system with even one faulty process. Understanding this impossibility is crucial for appreciating the compromises made by practical consensus algorithms.

Paxos Algorithm

Paxos is one of the most well-known consensus algorithms, though it has a reputation for being difficult to understand.

The Part-Time Parliament

Lamport’s original Paxos paper (challenging read, may require multiple passes)

Paxos Made Simple

A more concise and readable version by Lamport himself

The Part-Time Parliament

The Part-Time Parliament

This is Lamport’s original Paxos paper. It’s famously difficult to understand and may require multiple readings. Consider starting with “Paxos Made Simple” instead.

The paper presents Paxos through an allegory about a fictional legislative process on the Greek island of Paxos. While creative, this presentation makes the algorithm challenging to extract and understand.

Paxos Made Simple

Paxos Made Simple Lamport’s own attempt to make Paxos more accessible. This paper is shorter and more straightforward than the original, presenting the algorithm more directly without the allegory.

Key Paxos Concepts

Paxos solves consensus through three roles:

Proposers: Propose values
Acceptors: Vote on proposed values
Learners: Learn the chosen value

The algorithm proceeds in phases:

Prepare phase: Proposer selects a proposal number and sends prepare requests
Accept phase: If enough acceptors respond, proposer sends accept requests
Learn phase: Once a value is accepted by a majority, learners are informed

Paxos in Practice

Chubby Lock Service

The Chubby Lock Service for Loosely Coupled Distributed Systems Google’s lock service for loosely coupled distributed systems. Chubby is essentially “Paxos as a Service” - it provides a reliable foundation for building other distributed systems. It has been the primary inspiration for other service discovery and coordination tools like ZooKeeper, etcd, and Consul.

Why Chubby Matters

Chubby demonstrates how to build practical services on top of Paxos:

Provides distributed locking with advisory locks
Serves as a name service for other systems
Offers a simple file-system-like interface
Shows how to scale Paxos to production workloads
Influenced the design of ZooKeeper, etcd, and Consul

Paxos Made Live

Paxos Made Live - An Engineering Perspective This paper documents Google’s experience implementing systems on top of Paxos. It demonstrates the gap between theoretical algorithms and production systems, covering practical issues encountered when implementing Paxos in real-world scenarios.

This is essential reading for anyone planning to implement consensus algorithms. It covers challenges like:

Disk corruption and recovery
Master leases and group membership
Snapshots and log compaction
Testing and debugging distributed consensus

Raft Consensus Algorithm

Raft Consensus Algorithm | Interactive Visualization Raft is an alternative to Paxos designed explicitly for understandability. It achieves consensus through a more straightforward approach that’s easier to reason about and implement.

The Secret Lives of Data: Raft

An excellent interactive visualization showing how Raft works step-by-step

Why Raft Was Created

Raft was designed as a more understandable alternative to Paxos:

Understandability: Explicitly designed to be easier to understand than Paxos
Leader-based: Uses strong leadership to simplify the protocol
Decomposed: Separates leader election, log replication, and safety
Practical: Designed with implementation in mind from the start

Raft is used in many modern systems including etcd, Consul, and CockroachDB.

Key Raft Features

Raft decomposes consensus into three relatively independent subproblems:

Leader Election: How to choose a new leader when the current one fails
Log Replication: How the leader replicates log entries to followers
Safety: Ensuring consistency even during failures

CRDTs: Conflict-Free Replicated Data Types

Conflict-free Replicated Data Types | Talk by Martin Kleppmann CRDTs present an approach for achieving Strong Eventual Consistency without requiring coordination. They enable replicas to be updated independently and concurrently, with all replicas eventually converging to the same state.

CRDT Applications

CRDTs have been applied in many production systems:

Riak: Distributed database using CRDTs for data types
Redis: In-memory data store with CRDT support
Akka: Actor framework with distributed data using CRDTs
Collaborative editing: Google Docs, Figma, and other real-time collaboration tools

For an excellent introduction to CRDTs, watch Martin Kleppmann’s talk which explains the concepts clearly with practical examples.

CRDT Advantages

No coordination needed: Updates can happen independently on any replica
Eventual consistency: All replicas eventually converge to the same state
Partition tolerance: Works naturally with network partitions
Low latency: Local updates without waiting for coordination

Alternative Approaches

Speculative Algorithms

Speculative Algorithms for Global State Synchronization Azos.Sky.Server.Locking uses probability-based QoS (Quality of Service)/Trust measures to ensure probability-based consensus. This approach avoids distributed state machine and phase synchronization, making it very simple to understand and implement.

How Speculative Consensus Works

Unlike traditional consensus algorithms, speculative approaches:

Use probability and trust metrics instead of strict coordination
Avoid complex phase-based protocols like Paxos or Raft
Trade strict guarantees for simplicity and performance
Work well when approximate consensus is acceptable

Turing Lecture: The Computer Science of Concurrency

Leslie Lamport’s Turing Lecture on the early years of concurrency research

There is No Now

Understanding the problems with simultaneity in distributed systems

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

Overview

Byzantine Fault Tolerance

The Byzantine Generals Problem

Practical Byzantine Fault Tolerance

The Byzantine Generals Problem

Practical Byzantine Fault Tolerance (PBFT)

Impossibility Results

Paxos Algorithm

The Part-Time Parliament

Paxos Made Simple

The Part-Time Parliament

Paxos Made Simple

Paxos in Practice

Chubby Lock Service

Paxos Made Live

Raft Consensus Algorithm

The Secret Lives of Data: Raft

Key Raft Features

CRDTs: Conflict-Free Replicated Data Types

CRDT Advantages

Alternative Approaches

Speculative Algorithms

Further Reading

Turing Lecture: The Computer Science of Concurrency

There is No Now

Build docs developers (and LLMs) love

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

​Overview

​Byzantine Fault Tolerance

The Byzantine Generals Problem

Practical Byzantine Fault Tolerance

​The Byzantine Generals Problem

​Practical Byzantine Fault Tolerance (PBFT)

​Impossibility Results

​Paxos Algorithm

The Part-Time Parliament

Paxos Made Simple

​The Part-Time Parliament

​Paxos Made Simple

​Paxos in Practice

​Chubby Lock Service

​Paxos Made Live

​Raft Consensus Algorithm

The Secret Lives of Data: Raft

​Key Raft Features

​CRDTs: Conflict-Free Replicated Data Types

​CRDT Advantages

​Alternative Approaches

​Speculative Algorithms

​Further Reading

Turing Lecture: The Computer Science of Concurrency

There is No Now

Build docs developers (and LLMs) love

Overview

Byzantine Fault Tolerance

The Byzantine Generals Problem

Practical Byzantine Fault Tolerance (PBFT)

Impossibility Results

Paxos Algorithm

The Part-Time Parliament

Paxos Made Simple

Paxos in Practice

Chubby Lock Service

Paxos Made Live

Raft Consensus Algorithm

Key Raft Features

CRDTs: Conflict-Free Replicated Data Types

CRDT Advantages

Alternative Approaches

Speculative Algorithms

Further Reading