Scaling Databases: Replication, Partitioning, and Sharding
The database is almost always the last thing to scale and the hardest. Here is the honest ladder, vertical, replicas, partitioning, sharding, and the trade-offs nobody warns you about.
You built an app, traffic grew, and you scaled the web tier by adding more stateless servers behind a load balancer. That part was easy. Then everything got slow again, and this time adding web servers does nothing, because they all hammer the **same** database. This article is for engineers staring at a maxed-out database and wondering which lever to pull next, and in what order.
Stateless tiers scale by cloning. The database can't, because it holds state, the one true copy of your data. The moment you have two copies, you have a consistency problem, and most of database scaling is just managing that problem with different trade-offs. Before you touch any of this, make sure the database is actually the bottleneck and not a missing index: read database indexing and query performance first. A single well-placed index routinely buys more than a whole sharding project.
The honest truth: every technique below trades simplicity for capacity. Vertical scaling keeps one tidy box but hits a ceiling. Replicas multiply reads but introduce lag. Sharding multiplies writes but breaks joins, transactions, and your weekends. Scale in the order that buys the most headroom for the least complexity, and stop the moment you have enough.
A mental model: lanes, then stores
One cashier serving a long queueA single database handling every read and write
Open more checkout lanes that all ring up the same shelvesRead replicas, many copies of the same data serving reads
Split into separate stores, each stocking part of the inventorySharding, each node owns a distinct slice of the rows
Each scaling step is a different answer to 'too many customers for one shop'.
The analogy hides the catch. Extra lanes (replicas) only help if customers are mostly looking, not buying, replicas serve reads, not writes. And separate stores (shards) mean a customer who wants items from two stores has to visit both: that is a cross-shard query, and it is exactly as annoying as it sounds.
The picture: replicas, then shards
Top: one primary, many read replicas, a router splitting reads from writes. Bottom: the same app fanning out to independent shards.
1
Write arrives
The app sends an INSERT/UPDATE. The router sends every write to the single primary, there is exactly one authority for the truth.
2
Primary streams its changelog
The primary ships its write-ahead log (WAL) to each replica. Replicas replay it to converge toward the primary's state, usually within milliseconds.
3
Read arrives
The router sends reads to a replica, spreading read load across many machines while the primary stays free to absorb writes.
4
Writes outgrow one box
When even the primary can't keep up with writes, you shard: a shard router hashes a shard key and sends each row to the one node that owns that slice.
Replication vs partitioning vs sharding
These three words get used interchangeably and they are not the same thing. Replication copies the whole dataset to multiple nodes. Partitioning splits one table into pieces that still live in one database. Sharding is partitioning across multiple independent databases. Each scales a different axis.
Technique
Scales
Complexity
Replication
Reads (and HA / failover)
Low–medium: handle replication lag
Partitioning
Storage + per-query scan size
Low: one DB, transparent to most queries
Sharding
Writes + storage (the only thing that does)
High: routing, cross-shard joins, resharding
What each technique actually buys you, and what it costs.
Sharding is the only one that scales writes
Replicas don't help write throughput, every write still goes through the one primary. Partitioning keeps everything on one machine. If your bottleneck is **write** volume, sharding is the only tool on this list that addresses it. That is also why it's last: it's the most expensive answer to the rarest problem.
Read/write split in code
The first real lever after vertical scaling is read replicas. The application has to route reads to a replica and writes (plus any read that must be perfectly fresh) to the primary. Keep this in one place, a data-access layer, never scattered across handlers.
db/router.ts
typescript
import { Pool } from'pg';
const primary = newPool({ host: 'primary.db.internal' });
const replicas = [
newPool({ host: 'replica-a.db.internal' }),
newPool({ host: 'replica-b.db.internal' }),
];
let rr = 0;
functionpickReplica(): Pool {
rr = (rr + 1) % replicas.length; // round-robinreturn replicas[rr];
}
// Writes, and anything that must read its own write, go to the primary.exportfunctionwrite(sql: string, params: unknown[]) {
return primary.query(sql, params);
}
// Reads tolerate slight staleness, so spread them across replicas.exportfunctionread(sql: string, params: unknown[]) {
returnpickReplica().query(sql, params);
}
// Escape hatch: force a read onto the primary when freshness is mandatory.exportfunctionreadFresh(sql: string, params: unknown[]) {
return primary.query(sql, params);
}
Before you add a third replica, add connection pooling. Postgres forks a process per connection, so a few hundred direct connections from your app fleet will exhaust its memory long before CPU is the limit. A pooler like PgBouncer sits in front and multiplexes thousands of client connections onto a small pool of real backend ones.
pgbouncer.ini
bash
[databases]
app = host=primary.db.internal port=5432 dbname=app
[pgbouncer]
listen_port = 6432# transaction mode returns the backend connection after each tx, # the highest multiplexing, at the cost of session-level features.
pool_mode = transaction
max_client_conn = 5000# clients PgBouncer will accept
default_pool_size = 25# real Postgres connections per database
Partitioning one big table
Before reaching for sharding, see whether partitioning solves your problem. If one table is huge but the database machine is fine, declarative partitioning splits the table into child tables by a key (commonly a date range), so queries scan only the relevant partition and old data drops with a single DETACH.
partition.sql
sql
-- One logical table, physically split by month.
CREATE TABLE events (
id bigserial,
user_id bigint NOT NULL,
kind text NOT NULL,
ts timestamptz NOT NULL
) PARTITION BY RANGE (ts);
CREATE TABLE events_2026_05 PARTITION OF events
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
CREATE TABLE events_2026_06 PARTITION OF events
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
-- A range query touches only the partition(s) it needs (partition pruning).
EXPLAIN SELECT count(*) FROM events
WHERE ts >= '2026-06-01' AND ts < '2026-06-08';
Partitioning is cheap because it's transparent, the application still sees one events table. Sharding is partitioning's expensive cousin: the same idea, but the pieces live on different machines the app must route between explicitly.
Choosing a shard key, and living with it
Sharding starts with one decision that is almost impossible to undo: the shard key. It determines which node owns each row. A good key spreads load evenly and keeps related rows together; a bad one creates hot shards and forces every query to fan out across all of them.
shard/route.ts
typescript
import { createHash } from'crypto';
const SHARD_COUNT = 8;
const shards = Array.from({ length: SHARD_COUNT }, (_, i) => ({
id: i,
host: `shard-${i}.db.internal`,
}));
// Hash the shard key so rows spread evenly, independent of key ordering.exportfunctionshardFor(shardKey: string) {
const digest = createHash('md5').update(shardKey).digest();
const bucket = digest.readUInt32BE(0) % SHARD_COUNT;
return shards[bucket];
}
// A query scoped to one user hits exactly one shard, fast.exportfunctionordersForUser(userId: string) {
const shard = shardFor(userId);
returnquery(shard.host, 'SELECT * FROM orders WHERE user_id = $1', [userId]);
}
Cross-shard joins and resharding will hurt
Pick `user_id` and a single user's data sits on one shard, great. But 'top-selling products this week' now needs every shard to compute a partial answer that you merge in the app: a **scatter-gather** query, as slow as your slowest shard. Cross-shard **transactions** lose ACID guarantees (no single node can commit atomically). And `hash % SHARD_COUNT` means adding a shard remaps almost every key, **resharding** is a migration of nearly your whole dataset under live traffic. Use [consistent hashing](/blog/scalability-principles) or pre-split into more logical shards than machines to soften this. There is no painless reshard.
Replication lag and read-your-writes
Replicas are not instant copies. The primary commits, then streams the change; for a window of milliseconds (or seconds under load) the replica is stale. This is replication lag, and it produces a bug that confuses every team the first time: a user updates their profile, the write lands on the primary, the next page read hits a lagging replica, and their change has vanished.
Read-your-writes consistency
The fix has three common shapes. (1) **Read from the primary** for a short window after a user's write, simple, costs the primary some reads. (2) **Sticky reads**: pin that user to a replica known to be caught up. (3) Track the write's log position (LSN) and only read from a replica that has replayed past it. Whichever you pick, decide it **per query**: most reads happily tolerate staleness, only a few (the user's own just-submitted data) demand freshness. Treating every read as needing freshness throws away the entire benefit of replicas.
CAP and PACELC in practice
Once data lives on multiple nodes, the network can split them. CAP says when a partition happens you choose between consistency (reject the request rather than serve stale data) and availability (serve, possibly stale). You don't get both during a partition, that's not a vendor failing, it's physics.
PACELC: if there's a Partition, trade Availability vs Consistency; Else (normal operation) trade Latency vs Consistency.
PACELC is the more useful framing because partitions are rare but the else clause is always true: every replicated system trades latency for consistency on every request. Synchronous replication (wait for replicas to confirm before commit) gives strong consistency and higher write latency; asynchronous replication gives low latency and a lag window. There is no setting that gives you both, choose per workload, and read SQL vs NoSQL for stores that pick the other corner on purpose.
The scaling ladder, climb in order
Resist the urge to jump to sharding because it sounds impressive. Each rung buys headroom for far less complexity than the next. Climb only as far as your traffic forces you.
Cache the hot reads first. A cache in front of the database deflects the majority of repeated reads and is the cheapest order-of-magnitude you'll ever get. Most 'database scaling' problems are really cache-miss problems.
Scale vertically. A bigger box (more RAM so the working set stays in memory, faster disks) is boring, immediate, and requires zero code change. Buy the ceiling before you engineer around it.
Add read replicas with a read/write split and connection pooling. This handles the common read-heavy case and gives you failover for free.
Partition large tables so queries scan less and old data ages out cheaply, still one machine, still transparent.
Shard only when writes genuinely exceed one primary. It's the most powerful and the most painful rung, and you should arrive here reluctantly, not eagerly.
Also ask whether the data even belongs in your primary relational store. Full-text search belongs in a search engine, ephemeral counters in a cache, and high-write append-only event streams may fit a purpose-built store far better than a sharded SQL cluster. Moving a workload off the database is often cheaper than scaling the database to hold it.
Common mistakes that cost weekends
Sharding before caching or indexing. The hardest tool for the problem an index would have fixed. Always check query performance first.
Choosing a shard key by gut. A monotonic key (like an auto-increment id or created_at) sends all new writes to the newest shard, a permanent hot spot. Hash a high-cardinality key your queries actually filter on.
Assuming replicas are fresh. Routing a read-your-writes flow to a lagging replica makes users' own changes disappear. Decide freshness per query.
Treating cross-shard transactions like local ones. They aren't atomic. Either keep a transaction's rows on one shard via the key, or design for eventual consistency with idempotent retries.
No plan for resharding. Hardcoding % N guarantees a full-dataset migration the day you add capacity. Pre-split into many logical shards or use consistent hashing from day one.
Forgetting connection limits. Adding app servers without a pooler exhausts the primary's connections and takes everything down, independent of how much CPU is left.
Takeaways
The whole article in nine lines
The database is the hard tier because it holds state, two copies means a consistency problem to manage.
Climb the ladder in order: cache, vertical, replicas, partition, shard. Stop when you have enough.
Replicas scale reads, not writes; only sharding scales writes, and it's the most expensive rung.
Always run a read/write split through a pooler (PgBouncer); direct connections exhaust the primary.
Partitioning splits a table on one machine (cheap, transparent); sharding splits across machines (routing, pain).
The shard key is nearly irreversible: hash a high-cardinality key your queries filter on; avoid monotonic keys.
Cross-shard joins are scatter-gather (slow); cross-shard transactions lose atomicity; resharding is a full migration.
Replication lag breaks read-your-writes, decide freshness per query, not globally.
CAP/PACELC is physics: during a partition pick C or A; otherwise you always trade latency for consistency.
Where to go next
Database scaling sits inside the broader discipline of building systems that grow. Pair this with the foundations and the adjacent stores you'll reach for as you climb the ladder.
SQL vs NoSQL, when a different store picks the trade-off you actually need.
Scalability principles, consistent hashing, statelessness, and the laws behind every rung here.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.