Database Indexes
How indexes speed up database queries and the storage and write-performance tradeoffs they introduce.
The Problem: Full Table Scans
Imagine a users table with 50 million rows. You run SELECT * FROM users WHERE email = 'alice@example.com'. Without an index, the database has no choice: it reads every single row, top to bottom, comparing the email column until it finds a match. This is called a full table scan.
On small tables this is fine. On anything production-scale, it is catastrophic. A query that should take 2ms ends up taking 8 seconds and locks up your entire API.
An index gives the database a shortcut: a pre-sorted lookup structure it can use to jump directly to the right rows, the same way you use the index at the back of a textbook.
How an Index Works
When you create an index on a column, the database builds a separate data structure that stores the indexed column values in sorted order, with each entry pointing back to the physical row location on disk.
Because the index is sorted, the database can use binary search, halving the search space on every step. On 50 million rows that's roughly 26 comparisons instead of 50 million.
Types of Indexes
Most databases support several index structures. Each solves a different problem.
B-Tree Index (the default)
The B-tree is the workhorse. PostgreSQL, MySQL, and SQLite all default to it. It stores values in a balanced tree so that lookups, range queries (BETWEEN, >, <), and sorted results all cost O(log n).
The easiest way to picture it: you are looking up the name "sarah" in a sorted list of 8 names. Instead of reading from the top, you go to the middle first, ask "is sarah before or after morgan?", then halve again. Three steps gets you there.
Hash Index
A hash index uses a hash function to map a key directly to a bucket, giving O(1) exact lookups. It is very fast for WHERE email = 'alice@example.com' but completely useless for range queries, since hashing destroys sort order. PostgreSQL supports hash indexes explicitly; in MySQL they are only available on Memory tables.
Composite Index
A composite (multi-column) index covers more than one column. The order matters: a composite index on (user_id, created_at) can answer queries filtering on user_id alone, or user_id AND created_at together, but it cannot efficiently serve queries filtering on created_at alone. This is the left-prefix rule.
Partial Index
A partial index covers only rows matching a WHERE clause. For example, CREATE INDEX ON orders (created_at) WHERE status = 'pending' builds a small index covering only pending orders. Useful when you almost always query a specific filtered subset of the table.
Full-Text Index
Designed for searching inside text fields, a full-text index builds an inverted index: a map from every word to the rows containing it. This is how Postgres tsvector, Elasticsearch, and Meilisearch work under the hood.
The Write Penalty
Here is the cost that surprises many engineers: every index you add makes writes slower.
When you INSERT a row, the database does not just append data to a table file. It must also insert a new entry into every index on that table, potentially rebalancing B-tree nodes in the process. The same applies to UPDATE and DELETE.
This is called write amplification: one logical write fans out into multiple physical writes. On write-heavy workloads (event pipelines, logging, high-frequency trading), too many indexes can make your database the bottleneck.
The rule: only create an index when you have a measured slow query that needs it. Don't index speculatively.
Cons at a Glance
- Slower writes: every
INSERT,UPDATE, andDELETEmust update all indexes on the table - Extra disk space: each index is its own data structure stored on disk; large tables with many indexes can double or triple storage requirements
- Index maintenance: over time indexes can become bloated from deleted rows leaving dead entries. PostgreSQL
VACUUM, for example, reclaims this space - Wrong index, no benefit: a composite index used in the wrong column order, or an index on a low-cardinality column (like a boolean
is_active), does nothing. Always useEXPLAIN ANALYZEto confirm your queries actually hit the index
Further Reading
- Use the index, Luke — the best free book on SQL indexes, written for developers not DBAs
- B-Trees and database indexes explained — PlanetScale's deep-dive on how B-trees back real indexes
- Composite indexes and the left-prefix rule — how multi-column index ordering works
- Partial indexes in PostgreSQL — when and why to index only a subset of rows
- How PostgreSQL uses indexes — a readable explanation of all index types in Postgres
Prerequisites
Code Examples
Continue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.