Virtual Memory

Understand how operating systems give every process its own private address space, enabling isolation, safety, and efficient use of physical RAM.

IntermediateFoundationsChapter: Foundations10 min read

The core illusion

Imagine a hotel where every guest is told: "You have the entire building to yourself." Each guest moves in, arranges furniture, and lives as if they own the place. Behind the scenes, the hotel manager routes each guest to a specific set of rooms, making sure nobody walks into someone else's space. The guests never need to coordinate with each other.

Virtual memory works exactly like this. Every process running on your computer believes it has exclusive access to a vast, contiguous block of memory addresses, starting at 0x0000000000000000. In reality, the operating system and CPU hardware silently map each process's virtual addresses to different locations in physical RAM, maintaining the illusion perfectly.

This design gives you three things for free: isolation (process A cannot read process B's memory), overcommitment (you can have more virtual memory than physical RAM), and simplicity (every process compiles to the same address layout without needing to know what else is running).

Pages: memory divided into fixed chunks

Rather than tracking memory byte-by-byte (which would require enormous bookkeeping), the OS divides both virtual and physical memory into fixed-size chunks called pages. On most modern systems a page is 4 KB (4,096 bytes).

Why fixed sizes? Because uniform chunk sizes make allocation and deallocation trivial: any free physical page can satisfy any virtual page request, eliminating external fragmentation (the problem where free memory exists but in fragments too small to use). The CPU's memory management hardware is also designed around this fixed page size, making translation fast.

Virtual-to-physical address translation

Each process has a page table: a data structure maintained by the OS that maps virtual page numbers to physical page frame numbers. When your code accesses a memory address, the CPU's Memory Management Unit (MMU) automatically:

Splits the virtual address into a page number and a byte offset within the page.
Looks up the page number in the page table.
Combines the resulting physical frame number with the original byte offset.
Forwards the resulting physical address to RAM.

Your code never sees any of this. It just uses the virtual address, and the hardware handles the rest in nanoseconds.

The TLB: a hardware shortcut

Walking the page table on every memory access would be painfully slow. Modern CPUs include a Translation Lookaside Buffer (TLB): a small, extremely fast hardware cache that stores the most recently used virtual-to-physical address mappings.

When the CPU needs to translate an address:

It checks the TLB first (typically 1-4 clock cycles).
On a TLB hit: the physical address is available immediately, no page table walk needed.
On a TLB miss: the MMU walks the page table (many cycles), loads the result into the TLB, and evicts the least-recently-used entry.

The TLB typically holds 64-1024 entries. Because real programs tend to access the same memory regions repeatedly (good locality of reference), TLB hit rates above 99% are common. When you context-switch between processes, the TLB is usually flushed because the mappings are no longer valid, which is part of why context switches have a measurable cost.

Page faults

When a process accesses a virtual address whose page table entry is marked not present in RAM, the MMU triggers a page fault: a CPU exception that hands control to the OS. Page faults come in two flavors:

Minor page fault: The page is already in memory but not yet mapped into this process's page table. This happens after fork() (copy-on-write pages) or when accessing freshly allocated heap memory that hasn't been touched yet. Resolution takes microseconds.

Major page fault: The page is not in RAM at all. The OS must read it from disk (the swap space), which takes milliseconds. This is the expensive case.

Warning

A process that frequently triggers major page faults is said to be thrashing. On a machine with insufficient RAM, the system can spend more time swapping pages in and out than actually executing code, grinding to a halt.

Swap space

Swap space (or the swap file on Windows/macOS) is a region of disk the OS uses as an overflow area for RAM. When physical memory is full and a new page is needed, the OS picks a rarely-used page, writes it to swap, marks the original physical frame as free, and reuses it.

Swap makes it possible to run more processes than would otherwise fit in RAM. The cost is stark: RAM access takes ~100 nanoseconds, while disk access takes ~100 microseconds (SSD) to ~10 milliseconds (HDD). When a process's working set exceeds available RAM and it begins hitting swap regularly, you will feel it.

Memory isolation and crash safety

Because each process has its own page table, the OS can enforce a hard rule: no virtual address in Process A can map to any physical frame belonging to Process B. This makes it physically impossible for one process to corrupt another's memory through normal code execution.

When a process crashes or runs amok and corrupts its own memory, the OS simply tears down that process's page table. Every other process continues unaffected, still mapped to their own physical frames. This is why a crashed browser tab does not take down the whole browser, and why a crashing microservice does not corrupt its neighbors.

Practical implications for engineers

mmap() is fast for large files. The traditional way to read a file involves a read() syscall, which copies data from the kernel's page cache into your process's heap (two copies total). mmap() instead inserts the file's pages directly into your process's virtual address space. When you access a byte in that region, you're reading from the page cache with zero extra copies. This is why databases and key-value stores use mmap() heavily.

Large heap allocations trigger page faults. When you call malloc() for a large block, the OS allocates virtual pages but does not immediately back them with physical frames. Each page only gets a physical frame on first access, triggering a minor page fault. This is called lazy allocation and it means allocating 10 GB of memory is fast; touching 10 GB of memory is slow (paid in page fault overhead).

fork() is cheaper than you think. When a process calls fork(), the child gets an exact copy of the parent's virtual address space. But the OS does not copy physical frames: it uses copy-on-write (CoW). Both parent and child point to the same physical pages, marked read-only. Only when either process writes to a page does the OS copy that specific page. This is why fork() is used prolifically in web servers despite cloning the entire address space.

bash

# See a process's virtual memory map
cat /proc/$$/maps

# Check total swap usage
free -h

# Watch page fault rate (majflt = major, minflt = minor)
ps -o pid,comm,minflt,majflt -p $$

# Use vmstat to watch swap-in/swap-out activity
vmstat 1 5

Prerequisites

Processes & Threads

Code Examples

Continue learning

ACID & Isolation Levels

Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.

API Gateways

Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.

API Security & OAuth 2.0

Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.

The core illusion

Pages: memory divided into fixed chunks

Virtual-to-physical address translation

The TLB: a hardware shortcut

Page faults

Swap space

Memory isolation and crash safety

Practical implications for engineers

Further Reading

Prerequisites

Related Concepts

Code Examples

Continue learning