Database Indexes

How indexes speed up database queries and the storage and write-performance tradeoffs they introduce.

IntermediateDatabasesChapter: Database Systems12 min read

The Problem: Full Table Scans

Imagine a users table with 50 million rows. You run SELECT * FROM users WHERE email = 'alice@example.com'. Without an index, the database has no choice: it reads every single row, top to bottom, comparing the email column until it finds a match. This is called a full table scan.

On small tables this is fine. On anything production-scale, it is catastrophic. A query that should take 2ms ends up taking 8 seconds and locks up your entire API.

An index gives the database a shortcut: a pre-sorted lookup structure it can use to jump directly to the right rows, the same way you use the index at the back of a textbook.


How an Index Works

When you create an index on a column, the database builds a separate data structure that stores the indexed column values in sorted order, with each entry pointing back to the physical row location on disk.

users table (unsorted) id email row 1 charlie@… #1 2 alice@… #2 3 bob@… #3 4 diana@… #4 Without index: scan all 4 rows index on email → email index (sorted) email (sorted) → row alice@… #2 bob@… #3 charlie@… #1 diana@… #4 With index: binary search → 1 lookup

Because the index is sorted, the database can use binary search, halving the search space on every step. On 50 million rows that's roughly 26 comparisons instead of 50 million.


Types of Indexes

Most databases support several index structures. Each solves a different problem.

B-Tree Index (the default)

The B-tree is the workhorse. PostgreSQL, MySQL, and SQLite all default to it. It stores values in a balanced tree so that lookups, range queries (BETWEEN, >, <), and sorted results all cost O(log n).

The easiest way to picture it: you are looking up the name "sarah" in a sorted list of 8 names. Instead of reading from the top, you go to the middle first, ask "is sarah before or after morgan?", then halve again. Three steps gets you there.

Looking up "sarah" in a B-Tree Each node asks: go left (smaller) or right (larger)? Start: "morgan" sarah > morgan → go right alice … morgan (not searched) "quinn" sarah > quinn → go right quinn … ryan sarah → #row found in 3 steps

Hash Index

A hash index uses a hash function to map a key directly to a bucket, giving O(1) exact lookups. It is very fast for WHERE email = 'alice@example.com' but completely useless for range queries, since hashing destroys sort order. PostgreSQL supports hash indexes explicitly; in MySQL they are only available on Memory tables.

Composite Index

A composite (multi-column) index covers more than one column. The order matters: a composite index on (user_id, created_at) can answer queries filtering on user_id alone, or user_id AND created_at together, but it cannot efficiently serve queries filtering on created_at alone. This is the left-prefix rule.

Partial Index

A partial index covers only rows matching a WHERE clause. For example, CREATE INDEX ON orders (created_at) WHERE status = 'pending' builds a small index covering only pending orders. Useful when you almost always query a specific filtered subset of the table.

Full-Text Index

Designed for searching inside text fields, a full-text index builds an inverted index: a map from every word to the rows containing it. This is how Postgres tsvector, Elasticsearch, and Meilisearch work under the hood.


The Write Penalty

Here is the cost that surprises many engineers: every index you add makes writes slower.

When you INSERT a row, the database does not just append data to a table file. It must also insert a new entry into every index on that table, potentially rebalancing B-tree nodes in the process. The same applies to UPDATE and DELETE.

Cost of INSERT with indexes INSERT row write to table file Update index #1 (email) Update index #2 (name) Update index #3 (created_at) 3× write amplification more indexes = slower INSERTs

This is called write amplification: one logical write fans out into multiple physical writes. On write-heavy workloads (event pipelines, logging, high-frequency trading), too many indexes can make your database the bottleneck.

The rule: only create an index when you have a measured slow query that needs it. Don't index speculatively.


Cons at a Glance

  • Slower writes: every INSERT, UPDATE, and DELETE must update all indexes on the table
  • Extra disk space: each index is its own data structure stored on disk; large tables with many indexes can double or triple storage requirements
  • Index maintenance: over time indexes can become bloated from deleted rows leaving dead entries. PostgreSQL VACUUM, for example, reclaims this space
  • Wrong index, no benefit: a composite index used in the wrong column order, or an index on a low-cardinality column (like a boolean is_active), does nothing. Always use EXPLAIN ANALYZE to confirm your queries actually hit the index

Further Reading

Prerequisites

Code Examples