Consensus Fundamentals
- consensus
- Getting a group of distributed nodes to agree on a single value despite failures and message delays.
- agreement property
- All non-faulty processes decide on the same value.
- validity property
- The decided value must have been proposed by some process.
- termination property
- All non-faulty processes eventually decide.
- safety property
- Nothing bad happens – no two processes decide different values.
- liveness property
- Something good eventually happens – processes eventually decide.
- FLP Impossibility Result
- Theorem by Fischer, Lynch, and Paterson proving that no deterministic consensus algorithm can guarantee both safety and liveness in a purely asynchronous system with even one crash failure.
- asynchronous system
- A system model with no bound on message delivery time; a crashed process is indistinguishable from a slow one.
Quorums and Failure Modes
- quorum
- A majority of servers in a cluster; required for both elections and commits.
- quorum intersection
- The property that any two majorities share at least one member, preventing two different values from being decided simultaneously.
- split-brain
- A partition causes multiple servers to simultaneously believe they are the sole leader, potentially accepting conflicting writes.
State Machine Replication
- state machine
- A deterministic system that always produces the same output given the same starting state and input sequence.
- state machine replication
- Running identical state machines on multiple servers, applying the same commands in the same order to achieve fault tolerance.
- replicated log
- A totally ordered log of commands; consensus ensures all servers agree on each log slot.
- commit
- A log entry stored on a majority of servers and safe to apply to the state machine.
Paxos
- Paxos
- A consensus algorithm using a two-phase prepare/accept protocol with majority quorums.
- proposer
- Initiates a consensus proposal and drives the two-phase protocol.
- acceptor
- Votes on proposals; a value is decided once a majority accept it.
- learner
- Learns the decided value by collecting accepted messages from acceptors.
- proposal number
- A unique, monotonically increasing integer identifying a Paxos proposal; acceptors ignore proposals below their promised number.
- Multi-Paxos
- Extension of Paxos that decides a sequence of values by running Phase 2 per log slot with a stable leader, skipping Phase 1 after the first slot.
Raft
- Raft
- A consensus algorithm providing the same safety guarantees as Paxos, designed for understandability.
- term
- A Raft epoch beginning with an election; serves as a logical clock.
- leader
- Handles all client requests and replicates log entries to followers; at most one per term.
- follower
- Passive server that responds to the leader and candidates; all servers start as followers.
- candidate
- A follower that timed out waiting for a heartbeat and is attempting to win an election.
- election timeout
- A randomized timer; if it expires before a heartbeat arrives, the follower starts an election.
- RequestVote RPC
- Message a candidate sends to request votes during an election.
- AppendEntries RPC
- Message a leader sends to replicate log entries and deliver heartbeats.
- heartbeat
- An empty AppendEntries RPC sent periodically to prevent followers from timing out.
- Log Matching Property
- If two logs share an entry with the same index and term, they are identical through that index.
- Leader Completeness Property
- A committed entry will appear in the logs of all future leaders.
- election restriction
- A server only votes for a candidate whose log is at least as up-to-date as its own.
- joint consensus
- Mechanism for safely changing cluster membership by requiring a majority from both old and new configurations.
- snapshot
- A point-in-time capture of state machine state, used for log compaction and catching up lagging followers.
- InstallSnapshot RPC
- Used to send a snapshot directly to a follower that has fallen too far behind for normal log replication.