The mental frameworks, vocabulary, and structured approach that separates senior engineers from the rest. Learn to scope, estimate, and design any system under interview pressure.
The mental frameworks, vocabulary, and structured approach that separates senior engineers from the rest. Learn to scope, estimate, and design any system under interview pressure.
Lesson outline
At a startup, the fastest path to production wins. At companies with millions of users, a wrong architectural decision can cost months of migration work and cause outages affecting real people. The gap between "code that works" and "system that scales" is architectural thinking — deciding the right abstractions, data flows, and failure boundaries before writing code.
System design is not about memorizing architectures. It is about asking the right questions in the right order: Who are the users? What scale do we need? Where are the failure points? What are we trading off? An engineer who can answer these questions systematically is worth 10x one who cannot.
The Principal Engineer Mindset
Before you open your IDE, spend 20 minutes on a napkin design. Every hour saved in design saves 10 hours of debugging later. This is not optional at scale — it is survival.
Step 1 — Clarify requirements (5 min). Never assume. Ask: functional requirements (what the system must do), non-functional requirements (latency, availability, consistency, scale). Example: "Is this read-heavy or write-heavy? What is the acceptable p99 latency? Do we need strong consistency or is eventual okay?"
Step 2 — Estimate scale (3 min). Back-of-envelope: DAU (daily active users), requests/second, storage needed per year. These numbers drive every architectural decision. 1M DAU × 10 requests/day = ~115 RPS. At 1KB per request that is 115 KB/s throughput.
Step 3 — High-level design (10 min). Draw the boxes: client → load balancer → app servers → cache → database. Identify the critical path. Do not go deep yet — establish the skeleton.
Step 4 — Deep dive on bottlenecks (15 min). Pick the hardest parts: How do you shard the database? What is the caching strategy? How do you handle hotspots? How does the system behave under failure?
Step 5 — Address cross-cutting concerns (5 min). Monitoring, authentication, rate limiting, data migration, cost.
Most Candidates Fail Here
They jump to Step 3 without clarifying requirements. The interviewer then asks "but what if we need 10x scale?" and the candidate has to redesign everything. Clarify FIRST.
1# Back-of-Envelope Estimation Template23## Useful Numbers to MemorizeMemorize these — you will need them in every system design interview4- 1 million seconds ≈ 11.5 days5- 1 billion requests/day ≈ 11,600 RPS6- SSD sequential read: ~500 MB/s7- Network within datacenter: ~10 Gbps8- HDD seek time: ~10ms | SSD: ~0.1ms | Memory: ~100ns910## Scale Estimation for a Twitter-like Feed11DAU: 100M users12Avg tweets/user/day: 1 (writes), 50 reads13Writes: 100M / 86,400 ≈ 1,160 writes/sec14Reads: 100M × 50 / 86,400 ≈ 58,000 reads/sec → read-heavy → cache aggressively1516Storage (tweets):17- 1 tweet = 280 chars ≈ 560 bytes + metadata ≈ 1 KBRead:write ratio drives architecture — cache for reads, queue for writes18- 1,160 writes/sec × 86,400 × 365 ≈ 36.5 TB/year1920## Capacity Planning Rule of Thumb21- Add 3x headroom for spikes22- Plan for 2x next year's projected growth23- Cache can absorb 80%+ of reads if hit rate > 90%
Load balancers distribute traffic across servers. Layer 4 (TCP) is faster; Layer 7 (HTTP) is smarter (route by URL, host, headers). Use L7 for microservices.
Caches sit in front of slow resources. L1 = in-process (fastest, not shared). L2 = Redis/Memcached (shared, network hop). CDN = edge cache (geographic). The cache hit rate is the most important metric — target 95%+.
Databases store durable state. Relational (PostgreSQL, MySQL) for structured, ACID-needed data. NoSQL (Cassandra, DynamoDB) for massive write throughput and flexible schema. Choose based on your access patterns, not popularity.
Message queues (Kafka, SQS, RabbitMQ) decouple producers from consumers, buffer spikes, and enable async processing. Critical for anything that can be processed later: emails, analytics, notifications.
Object storage (S3) for blobs: images, videos, backups. Unlimited scale, 99.999999999% durability, cheap. Never store blobs in your relational database.
| Building Block | Latency | Scale | When to Use |
|---|---|---|---|
| In-process cache | < 1µs | Single server | Computed values, hot config |
| Redis/Memcached | < 1ms | Cluster (TBs) | Shared session, hot objects |
| PostgreSQL | 1-10ms | Vertical + read replicas | ACID, complex queries |
| Cassandra | 1-5ms | Petabytes | High write throughput, time-series |
| Kafka | 5-10ms | Millions/sec | Event streaming, audit log |
| S3 | 10-100ms | Unlimited | Blobs, backups, static assets |
CAP theorem states that a distributed system can guarantee at most two of three: Consistency (every read sees the latest write), Availability (every request gets a response), Partition tolerance (system works despite network splits).
In practice, you always need Partition tolerance (networks fail). So the real choice is CP vs AP: Do you want strong consistency (risk unavailability) or high availability (risk stale data)?
CP systems (ZooKeeper, HBase, etcd): When a partition occurs, they refuse requests rather than return stale data. Use for: distributed locks, leader election, financial transactions.
AP systems (Cassandra, DynamoDB, CouchDB): When a partition occurs, they keep serving requests but may return stale data. Use for: social feeds, DNS, shopping carts.
The interviewer trap: "Do you prefer SQL or NoSQL?" Wrong answer: "It depends." Right answer: "It depends on my consistency and availability requirements. Let me explain my access patterns..."
PACELC is more useful than CAP
PACELC extends CAP: even when there is no Partition (normal operation), you still trade off latency vs consistency. DynamoDB defaults to eventual consistency (low latency) but offers strongly consistent reads at 2x cost.
These numbers from Jeff Dean (Google) are essential for system design. They tell you where time actually goes:
| Operation | Latency | Human Scale |
|---|---|---|
| L1 cache reference | 0.5 ns | 1 second |
| L2 cache reference | 7 ns | 14 seconds |
| Main memory reference | 100 ns | 3.5 minutes |
| SSD random read | 150 µs | 6 days |
| HDD seek | 10 ms | 16.5 minutes |
| Network: same datacenter | 0.5 ms | ... |
| Network: US cross-country | 40 ms | ... |
| Network: US to Europe | 150 ms | ... |
Key insight: memory is 1000x faster than SSD. SSD is 100x faster than disk. Network between datacenters adds 40-150ms. Design to minimize network hops and maximize cache hits.
1// Latency Budget Analysis — a real engineering discipline2// Every user-facing request has a latency budget.3// Allocate it consciously.45interface LatencyBudget {6target_p99_ms: number;7allocations: Array<{8component: string;9budget_ms: number;10actual_p99_ms?: number;11}>;12}1314const checkoutLatencyBudget: LatencyBudget = {15target_p99_ms: 300, // 300ms total budget for checkout API16allocations: [17{ component: 'Auth middleware', budget_ms: 5, actual_p99_ms: 3 },18{ component: 'Redis cart read', budget_ms: 5, actual_p99_ms: 4 },19{ component: 'Postgres: user data', budget_ms: 20, actual_p99_ms: 18 },20{ component: 'Inventory service', budget_ms: 50, actual_p99_ms: 45 },21{ component: 'Payment gateway', budget_ms: 180, actual_p99_ms: 200 }, // ← OVER BUDGET22{ component: 'Write order to DB', budget_ms: 20, actual_p99_ms: 12 },23{ component: 'Emit Kafka event', budget_ms: 10, actual_p99_ms: 8 },24{ component: 'Serialization/misc', budget_ms: 10, actual_p99_ms: 6 },25],Always trace latency per component — gut feel is wrong26};2728// Payment gateway is 200ms actual vs 180ms budget → p99 SLO breach29// Action: add timeout + async fallback, or negotiate SLA with payment vendor
Thundering herd: A cached key expires. 10,000 concurrent requests all miss cache simultaneously, hammering the database. Prevention: probabilistic early expiry (recalculate before expiry at random), cache stampede locks (only one request refills), background refresh.
Hot partition: 90% of traffic goes to one database shard (e.g. all users with last name A-B are in shard 1). Prevention: consistent hashing with virtual nodes, random suffix on hot keys.
Cascading failures: Service A is slow → B times out waiting → B's thread pool exhausts → B goes down → C goes down. Prevention: circuit breakers, bulkheads (separate thread pools per dependency), timeouts everywhere.
Head-of-line blocking: One slow request in a queue blocks all subsequent requests. Prevention: request hedging (send to 2 replicas, take first response), async processing, priority queues.
Cascading failures kill companies
The 2021 Facebook outage was a cascading failure: BGP route withdrawal → DNS servers offline → all services trying to reconnect simultaneously → overloaded internal systems. Duration: 6 hours. Cost: ~$60M in lost revenue.
System design interviews test structured thinking under uncertainty. Interviewers want to see your process, not a memorized architecture.
Common questions:
Strong answers include:
Red flags:
Quick check · System Design Fundamentals: From Idea to Architecture
1 / 3
Key takeaways
From the books
System Design Interview – An Insider's Guide — Alex Xu (2020)
Chapter 1: Scale From Zero To Millions Of Users
Start with a single server, then add each component only when you have a concrete reason. Premature optimization is the root of all evil — but so is premature simplicity at scale.
Designing Data-Intensive Applications — Martin Kleppmann (2017)
Chapter 1: Reliable, Scalable, and Maintainable Applications
The three concerns of every data system: reliability (correct under adversity), scalability (coping with load), maintainability (other engineers can work on it). Optimize in this order.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.