Database Indexing & Query Performance

On this page

The query that worked yesterday
What an index actually is
How the database decides: planner vs scan
When an index helps, and when it hurts
Reading an EXPLAIN plan
The N+1 query problem
Common mistakes that cost hours
Takeaways
Where to go next

The query that worked yesterday

You wrote it months ago and it was perfect: SELECT * FROM orders WHERE customer_id = 42. On your laptop, with 1,000 rows of seed data, it returned in under a millisecond. You shipped it. Everyone was happy.

Then the app grew. The orders table crossed 10 million rows. That same query now takes 8 seconds, the request times out, and your on-call phone is buzzing. Nothing in the code changed. So what happened?

What happened is that the database was always doing the slow thing, it just didn't matter when the table was tiny. Without an index, the database reads every single row to find the ones matching customer_id = 42. At 1,000 rows that is invisible. At 10 million rows it is a disaster. This article is about seeing that work, measuring it, and fixing it.

Who this is for

Backend developers who can write SQL and ship features, but have never opened an EXPLAIN plan or deliberately added an index. If you have ever wondered why a query got slow as your data grew, this is for you. We use Postgres-flavored SQL, but the mental model applies to MySQL, SQLite, and most relational databases.

What an index actually is

An index is a separate, sorted data structure that lets the database jump straight to the rows you want, instead of scanning the whole table to find them.

The key word is sorted. A table on disk is roughly a pile of rows in insertion order. To find customer_id = 42 in an unsorted pile, you have no choice but to look at every row. An index keeps a copy of one (or more) columns kept in sorted order, with a pointer back to the full row. Because it is sorted, the database can binary-search it.

Finding a topic by flipping through every page of a 900-page bookA full table scan: read all 10M rows to find the matching ones

The alphabetical index at the back of the bookA database index: sorted keys with pointers to the data

Looking up 'indexing', seeing 'p. 412', flipping straight thereIndex lookup: binary-search the key, follow the pointer to the row

The publisher had to print the index and reprint it when content changesIndexes cost disk space and slow down writes (every INSERT updates them)

An index trades a little extra storage and write cost for enormous read speed, exactly like a book index.

The B-tree mental model

Almost every index you create is a B-tree (balanced tree). Picture a shallow, bushy tree: the top node splits the key range into a few branches, each branch splits again, and the bottom level holds the sorted keys with pointers to rows. To find a value, the database starts at the top and walks down, and because the tree is balanced and wide, it only takes a handful of hops even for billions of rows.

That is the whole magic: a full scan is O(n), work grows linearly with row count, while a B-tree lookup is roughly O(log n). Going from 1M to 10M rows barely adds a hop. That is why the indexed query stays fast as the table grows, and the unindexed one falls off a cliff.

How the database decides: planner vs scan

You never tell the database *how* to run a query, you describe *what* you want, and a component called the query planner decides the strategy. Its main fork in the road: do a sequential scan (read the whole table) or an index lookup (use a B-tree to jump to the rows). It picks based on statistics about your data.

The same query takes two very different paths depending on whether a useful index exists.

The planner is not dumb, but it can only choose the index path if a useful index exists. Your job is to give it the right index, and then confirm it actually used it.

1
Find the slow query
Use your database's slow-query log, your APM (Datadog, New Relic), or just the request that times out. You cannot fix what you have not measured, start from a real, specific query.
2
Run EXPLAIN ANALYZE
Prefix the query with EXPLAIN ANALYZE to see the plan the database chose and how long each step actually took. Look for the words 'Seq Scan' on a large table, that is your smoking gun.
3
Add the right index
Create an index on the column(s) in your WHERE / JOIN / ORDER BY. Match the column order to how you filter. Add it on the column doing the filtering, not whatever is convenient.
4
Verify it worked
Re-run EXPLAIN ANALYZE. Confirm the plan now says 'Index Scan' instead of 'Seq Scan' and the execution time dropped. If the planner still ignores your index, your index is wrong for the query.

When an index helps, and when it hurts

Indexes are not free and they are not always a win. They speed up reads but slow down writes, and they only help when the query can actually use them. Here is the honest trade-off table.

Situation	Index helps	Why
WHERE on a high-selectivity column (e.g. email, user_id)	Yes	Few rows match, so jumping to them beats scanning
JOIN on a foreign key	Yes	The join probes the other table once per row, an index makes each probe fast
ORDER BY / range queries (created_at > ...)	Yes	A B-tree is already sorted, so the database skips the sort step
Column with only 2-3 distinct values (e.g. is_active)	Rarely	Low selectivity, half the table matches, so a scan is often cheaper
A write-heavy table with few reads	Often hurts	Every INSERT/UPDATE must also update the index, slowing writes
Wrapping the column in a function (LOWER(email))	No	The plain index does not match, the planner falls back to a scan

Indexes are a tool, not a default. Add them where reads are selective and frequent.

Reading an EXPLAIN plan

Let's make it concrete. Here is the slow query, the plan the database gives us, the index that fixes it, and the proof.

slow-query.sql

sql

-- The query: find one customer's orders in a 10M-row table
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42;

Read the output bottom-up, inside-out, the most indented node runs first. The line that matters here is the top one:

plan-before.txt

sql

Seq Scan on orders  (cost=0.00..195000.00 rows=98 width=64)
                    (actual time=0.21..8124.6 rows=97 loops=1)
  Filter: (customer_id = 42)
  Rows Removed by Filter: 9999903
Planning Time: 0.14 ms
Execution Time: 8126.9 ms

Decode it: Seq Scan means it read the whole table. Rows Removed by Filter: 9999903 means it looked at ~10M rows and threw away all but 97, pure wasted work. Execution Time: 8126 ms is the 8 seconds your users feel. The fix is to give the planner a sorted path to customer_id:

add-index.sql

sql

-- CONCURRENTLY avoids locking the table on a live system
CREATE INDEX CONCURRENTLY idx_orders_customer_id
  ON orders (customer_id);

Now re-run the same EXPLAIN ANALYZE. The plan changes completely:

plan-after.txt

sql

Index Scan using idx_orders_customer_id on orders
  (cost=0.43..8.51 rows=98 width=64)
  (actual time=0.03..0.09 rows=97 loops=1)
  Index Cond: (customer_id = 42)
Planning Time: 0.16 ms
Execution Time: 0.12 ms

Index Scan instead of Seq Scan, no "Rows Removed by Filter", and 0.12 ms instead of 8126 ms, a ~67,000x speedup from one line of SQL. That is the entire loop: measure, index, verify.

Composite indexes: column order matters

When you filter on more than one column, you can build a composite index covering several columns at once. But the order of columns is not arbitrary, it follows the left-prefix rule: an index on (customer_id, status) can serve queries filtering on customer_id alone, or on customer_id AND status, but not on status alone.

composite.sql

sql

-- Good for: WHERE customer_id = ?  AND status = ?
-- Good for: WHERE customer_id = ?
-- NOT used for: WHERE status = ?  (status is not the leftmost column)
CREATE INDEX idx_orders_customer_status
  ON orders (customer_id, status);

Rule of thumb: put the column you filter on with equality (=) first, and the column you filter with a range (>, <, BETWEEN) or sort by last. Think of it like a phone book sorted by last name, then first name, great for "all the Smiths," useless for "everyone named John."

The N+1 query problem

Not every slow page is an indexing problem. The most common performance bug in app code is N+1 queries, and no index will save you from it. It hides inside ORMs (ActiveRecord, Hibernate, Prisma, Django) because the database calls are invisible in your code.

Here is the trap. You fetch a list of authors, then loop over them to show each author's book count:

n-plus-one.sql

sql

-- 1 query to get the authors
SELECT * FROM authors LIMIT 100;

-- then your code loops and fires ONE query PER author...
SELECT count(*) FROM books WHERE author_id = 1;
SELECT count(*) FROM books WHERE author_id = 2;
SELECT count(*) FROM books WHERE author_id = 3;
-- ...97 more times.  That is 1 + 100 = 101 round trips.

One query becomes 101. Each round trip has network latency, so even fast individual queries add up to a sluggish page. The name says it: 1 query to get the list, plus N queries (one per row). The fix is to ask the database for everything in one query, known as eager loading, using a JOIN or an IN clause:

eager-load.sql

sql

-- One query instead of 101
SELECT a.id, a.name, count(b.id) AS book_count
FROM authors a
LEFT JOIN books b ON b.author_id = a.id
GROUP BY a.id, a.name
LIMIT 100;

In ORMs the fix is usually one keyword: includes(:books) in Rails, JOIN FETCH in JPA/Hibernate, select_related / prefetch_related in Django, include in Prisma. Learn how your ORM eager-loads, it is the single highest-leverage performance skill in app code. (And yes, make sure books.author_id is indexed so the JOIN is fast.)

Common mistakes that cost hours

Indexing everything. An index on every column slows every write and bloats storage, often for columns you never filter on. Index for your actual query patterns, not "just in case."
Wrong column order in a composite index. (status, customer_id) will not help a query that filters on customer_id alone. Put the equality-filtered, high-selectivity column first (the left-prefix rule).
Functions on indexed columns. WHERE LOWER(email) = '...' cannot use a plain index on email. Either store the normalized value, or create a matching expression index on LOWER(email).
Ignoring the N+1 problem. A page that fires hundreds of tiny queries feels slow no matter how well each one is indexed. Watch your ORM's query log and eager-load relations.
Trusting EXPLAIN without ANALYZE. Plain EXPLAIN shows the planner's *estimate*; EXPLAIN ANALYZE actually runs the query and shows real timings and row counts. Estimates lie when statistics are stale.
Adding indexes that lock production. A plain CREATE INDEX can lock the table on a big live system. Use CREATE INDEX CONCURRENTLY (Postgres) to build it without blocking writes.

Takeaways

The whole article in seven lines

Without an index, the database scans every row, invisible at 1k rows, fatal at 10M.
An index is a sorted B-tree of one or more columns: O(log n) lookups instead of O(n) scans.
The planner picks Seq Scan vs Index Scan, but it can only choose an index if a useful one exists.
Always measure with `EXPLAIN ANALYZE`; the enemy is a 'Seq Scan' with millions of 'Rows Removed by Filter'.
Composite indexes follow the left-prefix rule: equality columns first, range/sort columns last.
Indexes speed reads but cost writes and storage, index for real query patterns, not everything.
N+1 queries are an app-code bug, not an index bug: fix them by eager-loading with a JOIN.

Where to go next

Indexing is the first lever you reach for when a query is slow, but it sits inside a bigger picture of how data is modeled and how often you have to hit the database at all. These two pair naturally with this article:

Databases 101: Relational Modeling, how tables, keys, and relationships are designed in the first place. Good indexes start with good schema.
Caching Strategies, the other way to make reads fast: don't query the database at all. Use it once indexing is no longer enough.
Walk the full Backend Engineer path to see where databases fit alongside APIs, queues, and reliability.

Pro tip

Next time a query feels slow, resist the urge to guess. Open `EXPLAIN ANALYZE` first, the plan tells you exactly what the database is doing, and the fix is usually one index or one JOIN away.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Cloud

Managed Databases: Relational vs NoSQL in the Cloud

Read

Backend

What Is a Backend Engineer?

Read

Backend

How the Web Works: HTTP Requests

Read