System Design Canvas

URL Shortener (bit.ly)

midCore Services6 steps · 2 decisions
Throughput

10K writes/sec, 1M reads/sec

Latency

<10ms p99 for redirect

Storage est.

100M URLs × 500 bytes = 50 GB/year

Clarify these before designing
1

7-char alphanumeric slug ≈ 3.5 trillion possible short URLs

2

Read:Write ratio ~100:1

3

Custom slugs required for premium users

Recommended approach
1
Clarify: analytics needed? Custom domains? Expiry?
2
ID generation: Base62-encoded auto-increment ID or MD5-first-7-chars
3
Storage: KV store (Redis + Cassandra) for URL mapping
4
Redirect: HTTP 301 (cacheable) vs 302 (real-time analytics)
5
Caching: LRU cache for top 20% most-accessed URLs
6
Scale: read replicas, CDN for redirect responses
Key components
ID Generator ServiceURL Mapping DB (Cassandra)Cache (Redis)CDNAnalytics Pipeline (Kafka + ClickHouse)
Trade-off decisions
Decision·ID generation strategy

Distributed (MD5/hash)

  • +No single point of failure
  • +Works across all servers

Centralised (auto-increment)

  • Possible collisions require retry logic
  • Longer computation time
Decision·HTTP redirect type

301 Permanent

  • +Browser caches — reduces server load dramatically
  • +Lower latency for repeat visits

302 Temporary

  • Cannot track click analytics accurately (browser handles it)
  • Cannot update destination
Capacity Estimation

10K writes/sec × 500 bytes = 5 MB/sec write throughput. 3.15 TB/year new URL data. 1M reads/sec → cache hit rate 80% → 200K actual DB reads/sec.

Interviewer evaluation

Strong signals ✓

Clarifies analytics requirement before choosing 301 vs 302

Explains collision probability and retry strategy for hash-based IDs

Designs separate write path and read path (CQRS pattern)

Considers custom slugs as a separate, lower-throughput write path

Has a cache invalidation strategy for when URLs are deleted

Red flags ✗

Single SQL database with auto-increment ID — bottleneck at scale

No caching strategy for the 100:1 read-heavy workload

Ignores analytics requirements entirely

Follow-up probes

?

How would you implement click analytics without slowing down the redirect?

?

How do you handle the thundering herd problem when a viral link first gets shared?

?

Your Redis cache fails. What is the blast radius and how do you degrade gracefully?