System Design Canvas

7 problems · approach steps · capacity math · trade-off decisions

URL Shortener (bit.ly)

midCore Services6 steps · 2 decisions

Throughput

10K writes/sec, 1M reads/sec

Latency

<10ms p99 for redirect

Storage est.

100M URLs × 500 bytes = 50 GB/year

Clarify these before designing

7-char alphanumeric slug ≈ 3.5 trillion possible short URLs

Read:Write ratio ~100:1

Custom slugs required for premium users

Recommended approach

Clarify: analytics needed? Custom domains? Expiry?

ID generation: Base62-encoded auto-increment ID or MD5-first-7-chars

Storage: KV store (Redis + Cassandra) for URL mapping

Redirect: HTTP 301 (cacheable) vs 302 (real-time analytics)

Caching: LRU cache for top 20% most-accessed URLs

Scale: read replicas, CDN for redirect responses

Key components

ID Generator ServiceURL Mapping DB (Cassandra)Cache (Redis)CDNAnalytics Pipeline (Kafka + ClickHouse)

Trade-off decisions

Decision·ID generation strategy

✓Distributed (MD5/hash)

+No single point of failure
+Works across all servers

✗Centralised (auto-increment)

−Possible collisions require retry logic
−Longer computation time

Decision·HTTP redirect type

✓301 Permanent

+Browser caches — reduces server load dramatically
+Lower latency for repeat visits

✗302 Temporary

−Cannot track click analytics accurately (browser handles it)
−Cannot update destination

Capacity Estimation

10K writes/sec × 500 bytes = 5 MB/sec write throughput. 3.15 TB/year new URL data. 1M reads/sec → cache hit rate 80% → 200K actual DB reads/sec.

Interviewer evaluation

Strong signals ✓

Clarifies analytics requirement before choosing 301 vs 302

Explains collision probability and retry strategy for hash-based IDs

Designs separate write path and read path (CQRS pattern)

Considers custom slugs as a separate, lower-throughput write path

Has a cache invalidation strategy for when URLs are deleted

Red flags ✗

Single SQL database with auto-increment ID — bottleneck at scale

No caching strategy for the 100:1 read-heavy workload

Ignores analytics requirements entirely

Follow-up probes

How would you implement click analytics without slowing down the redirect?

How do you handle the thundering herd problem when a viral link first gets shared?

Your Redis cache fails. What is the blast radius and how do you degrade gracefully?