Skip to main content
Career Paths
Concepts
Bep Consensus Algorithms
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Resume Builder
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Consensus Algorithms: Building Reliable Distributed Systems

How do 5 nodes agree on anything when networks are unreliable?

🎯Key Takeaways
Consensus algorithms (Raft, Paxos) enable distributed systems to agree despite failures and partitions.
CAP: P is not optional. The choice is CP (reject requests during partitions) vs AP (serve stale data).
Raft: leader election + log replication + safety. Quorum = N/2 + 1 nodes must agree.
Always run consensus clusters with odd numbers (3, 5, 7). A 5-node cluster tolerates 2 failures.
Fencing tokens prevent zombie locks — include a monotonically increasing token with all shared resource writes.
Match consistency model to use case: financial data → linearizability; shopping carts → eventual consistency.

Consensus Algorithms: Building Reliable Distributed Systems

How do 5 nodes agree on anything when networks are unreliable?

~5 min read
Be the first to complete!
What you'll learn
  • Consensus algorithms (Raft, Paxos) enable distributed systems to agree despite failures and partitions.
  • CAP: P is not optional. The choice is CP (reject requests during partitions) vs AP (serve stale data).
  • Raft: leader election + log replication + safety. Quorum = N/2 + 1 nodes must agree.
  • Always run consensus clusters with odd numbers (3, 5, 7). A 5-node cluster tolerates 2 failures.
  • Fencing tokens prevent zombie locks — include a monotonically increasing token with all shared resource writes.
  • Match consistency model to use case: financial data → linearizability; shopping carts → eventual consistency.

Lesson outline

The Fundamental Problem: Agreement in the Face of Failure

In 2012, a network partition at GitHub caused two MySQL nodes to both believe they were the primary. Both accepted writes. When the partition healed, they had diverged — 6 hours to recover from the split-brain.

The Consensus Problem

Getting multiple nodes to agree on a single value, even when some nodes are slow, some fail, and the network drops messages. Solved problems: leader election, distributed locks, cluster membership, replicated state machines (foundation of all distributed databases).

AlgorithmUsed ByKey PropertyLimitation
PaxosGoogle Chubby, Cassandra Lightweight TransactionsProven correct, minimal assumptionsNotoriously hard to implement correctly
Raftetcd (Kubernetes), CockroachDB, TiKV, ConsulDesigned for understandabilitySame safety as Paxos, more opinionated
ZABApache Zookeeper, Kafka controllerStrong ordering guaranteesZookeeper-specific, not general consensus

CAP Theorem: The Fundamental Trade-off

CAP theorem: A distributed system can guarantee at most two of three: Consistency, Availability, Partition Tolerance. Network partitions WILL happen — they're not optional. The real choice is CP vs AP.

SystemChoiceDuring PartitionUse Case
PostgreSQL + sync replicationCPRejects writes if replica unreachableBanking — correctness > availability
DynamoDB / CassandraAPAccepts writes, resolves conflicts laterShopping carts — availability > strict consistency
etcd / ZooKeeperCPRejects reads/writes without quorumConfiguration — must be correct

PACELC: Beyond CAP

During Partitions choose Availability or Consistency; Else choose Latency or Consistency. DynamoDB: low Latency + Eventual Consistency. Spanner: Consistency everywhere via TrueTime hardware clocks. Most systems are tunable on the spectrum.

Raft: Consensus for the Rest of Us

Raft Leader Election

→

01

All nodes start as Followers. If no heartbeat from Leader within election timeout (150–300ms), Follower becomes Candidate.

→

02

Candidate increments term number, votes for itself, requests votes from all other nodes.

→

03

Node grants vote if: (1) candidate's term >= own term, (2) hasn't voted this term, (3) candidate's log is at least as up-to-date.

→

04

Candidate with majority votes → becomes Leader. Immediately sends heartbeats.

05

Leader fails → Followers timeout → new election. Takes 150–500ms in production.

1

All nodes start as Followers. If no heartbeat from Leader within election timeout (150–300ms), Follower becomes Candidate.

2

Candidate increments term number, votes for itself, requests votes from all other nodes.

3

Node grants vote if: (1) candidate's term >= own term, (2) hasn't voted this term, (3) candidate's log is at least as up-to-date.

4

Candidate with majority votes → becomes Leader. Immediately sends heartbeats.

5

Leader fails → Followers timeout → new election. Takes 150–500ms in production.

Raft Log Replication

→

01

All writes go to the Leader.

→

02

Leader appends entry to local log (uncommitted) and sends AppendEntries RPC to all Followers in parallel.

→

03

Followers append to their logs and send ACK.

→

04

When majority (quorum: N/2 + 1) ACK, Leader commits and applies to state machine.

05

Leader informs Followers of commit in next heartbeat.

1

All writes go to the Leader.

2

Leader appends entry to local log (uncommitted) and sends AppendEntries RPC to all Followers in parallel.

3

Followers append to their logs and send ACK.

4

When majority (quorum: N/2 + 1) ACK, Leader commits and applies to state machine.

5

Leader informs Followers of commit in next heartbeat.

The Quorum Rule

A 5-node cluster tolerates 2 failures (quorum = 3). A 3-node cluster tolerates 1 failure (quorum = 2). ALWAYS run consensus clusters with odd numbers. A 4-node cluster tolerates only 1 failure — same as 3-node but twice the cost. 5 nodes is the production sweet spot.

distributed-lock.ts
1// Distributed Lock using etcd (Raft-backed)
2// A lock granted by etcd is guaranteed held by at most one client cluster-wide
3
4import { Etcd3 } from 'etcd3';
5
6const etcd = new Etcd3({ hosts: process.env.ETCD_HOSTS?.split(',') });
7
8async function withDistributedLock<T>(
9 lockName: string,
10 ttlSeconds: number,
11 operation: () => Promise<T>
12): Promise<T> {
TTL-based lease: etcd deletes the lock key if client crashes or disconnects
13 const lock = etcd.lock(lockName).ttl(ttlSeconds);
14
15 try {
16 // Block until lock acquired (queued behind other holders)
lock.acquire() is blocking — queues behind other lock holders automatically
17 await lock.acquire();
18 console.log(`Acquired lock: ${lockName}`);
19
20 return await operation();
21 } finally {
Always release in finally — even if operation throws an exception
22 // Always release — even if operation throws
23 await lock.release();
24 }
25}
26
27// Usage: ensure only one instance processes a batch at a time
28await withDistributedLock(
29 'batch-processor:nightly-report',
30 30, // 30s TTL — auto-released if process crashes (lease mechanism)
31 async () => {
32 await runNightlyReport();
33 }
34);
35
Fencing tokens prevent zombie locks where a slow process uses an expired lock
36// Fencing tokens: handle the "slow process" problem
37// Problem: process holds lock, GC pause causes lock to expire,
38// another process acquires lock → BOTH think they hold the lock
39//
40// Solution: etcd returns a monotonically increasing revision on each acquire.
41// Include this "fencing token" in writes to the shared resource.
42// Resource rejects writes with older tokens — zombie process rejected.

Consistency Models: Beyond CAP

Consistency Models from Strongest to Weakest

  • 🔒Linearizability — Every read returns the most recent write. Operations appear atomic and globally ordered. Required for distributed locks, leader election. Used by etcd, Spanner.
  • 🔑Causal Consistency — Causally related operations appear in order. If A happened before B, everyone sees A before B. DynamoDB consistent reads approximates this.
  • 🗝️Eventual Consistency — Given no new updates, all replicas converge. No timing guarantee. DynamoDB default reads, Cassandra default.
  • 📖Read-Your-Writes — A user always sees their own writes, even if others see stale data. Essential for good UX. Route user's reads to the replica they wrote to.

Match Consistency to Use Case

Financial balances → Linearizability. Shopping cart → Eventual consistency. User profile after update → Read-your-writes. Feature flags → Linearizability (all servers must see same config).

How this might come up in interviews

Consensus and CAP questions are standard at senior/staff levels. Show you can reason about tradeoffs, not just recite definitions.

Common questions:

  • Explain CAP theorem with a concrete example
  • How does Raft leader election work?
  • When would you choose eventual consistency over strong consistency?
  • How do you implement a distributed lock and what are its failure modes?

Strong answers include:

  • Says "P is not optional" when discussing CAP
  • Knows PACELC or discusses beyond simple CAP
  • Mentions fencing tokens unprompted
  • Explains when linearizability is required vs eventual consistency is fine

Red flags:

  • Thinks CAP means you can choose all three
  • "Just use eventual consistency everywhere"
  • Doesn't know what a quorum is
  • Never heard of fencing tokens

Quick check · Consensus Algorithms: Building Reliable Distributed Systems

1 / 3

In a 5-node Raft cluster, how many nodes can fail before the cluster loses consensus?

Key takeaways

  • Consensus algorithms (Raft, Paxos) enable distributed systems to agree despite failures and partitions.
  • CAP: P is not optional. The choice is CP (reject requests during partitions) vs AP (serve stale data).
  • Raft: leader election + log replication + safety. Quorum = N/2 + 1 nodes must agree.
  • Always run consensus clusters with odd numbers (3, 5, 7). A 5-node cluster tolerates 2 failures.
  • Fencing tokens prevent zombie locks — include a monotonically increasing token with all shared resource writes.
  • Match consistency model to use case: financial data → linearizability; shopping carts → eventual consistency.

From the books

Designing Data-Intensive Applications — Martin Kleppmann (2017)

Chapter 9: Consistency and Consensus

The best textbook treatment of distributed consensus. Covers linearizability, CAP, Paxos, Raft, and why distributed transactions are hard — all in one chapter.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.