mmap & Page Cache

Learn how the operating system caches physical disk blocks in RAM and uses memory mapping to optimize file I/O operations.

AdvancedFoundationsChapter: Foundations15 min read

The Operating System Page Cache

To reconcile the performance gap between fast physical RAM and slow storage devices, the operating system kernel implements a caching layer called the Page Cache.

When an application requests a file write or read operation:

  • The kernel does not execute a round-trip operation directly to physical storage.
  • Instead, it organizes system memory (RAM) into fixed-size segments called pages (usually 4 KB).
  • The kernel copies requested data sectors from disk blocks into these memory pages, fulfilling future queries from memory.

By serving I/O requests from cache pages, the system avoids hardware latency. If the requested data block is found in RAM, it results in a cache hit. If the data must be loaded from storage, it is a cache miss, requiring the calling thread to wait while the sector is transferred.


Direct Memory Mapping via mmap

The traditional way to interact with files is through system calls like read() and write(). These functions require the application to manage data buffers in user space, which can introduce performance overhead.

The mmap() system call provides an alternative. Instead of copying data, it projects a file descriptor's blocks directly into the process's virtual address space.

text
                  ┌─────────────────────────────────────┐
                  │      Virtual Address Space          │
                  │  0x7fff12000 -> [Mapped File Page]  │
                  └──────────────────┬──────────────────┘

                        (Direct Address Translation)

                  ┌─────────────────────────────────────┐
                  │      Physical RAM Page Cache        │
                  │   [ Data Buffer in Memory ]         │
                  └──────────────────┬──────────────────┘

                            (Flushed by Kernel)

                  ┌─────────────────────────────────────┐
                  │         Physical Disk Blocks        │
                  │   [ File Saved on SSD/HDD ]         │
                  └─────────────────────────────────────┘

When you call mmap(), the kernel updates the process page tables to point to the file blocks, returning a memory pointer to the mapping's starting address.

The application can then read and write file data using standard memory pointer offsets, such as:

c
char data = mapped_pointer[offset];

The kernel manages the address translations behind the scenes, ensuring the physical pages are loaded and synchronized.


Synchronization Logic: Flusher Threads and msync

When an application writes to a memory-mapped file, it modifies the data page in RAM. The kernel flags this modified page as a dirty page.

These modified blocks are not immediately written to disk. The operating system uses an asynchronous flush mechanism:

  • Kernel Flusher Threads: Background routines (such as pdflush, flush, or kswapd in Linux) monitor dirty pages.
  • Dirty Page Limits: When dirty pages exceed a configurable percentage of system memory, or when they remain modified longer than a timeout limit (often 30 seconds), these threads write the changes back to storage.

To enforce synchronization and ensure updates are safely committed to disk, applications can call:

c
int msync(void *addr, size_t length, int flags);

The flags parameter controls the synchronization behavior:

  • MS_ASYNC: Schedules the writeback but returns immediately.
  • MS_SYNC: Blocks the calling thread until the dirty pages are written to disk.

The Double-Buffering Overhead

Using standard file I/O operations like read() can introduce memory overhead due to a phenomenon called double-buffering.

Memory Access Models: Standard I/O vs. Memory Mapped I/O Standard I/O (read/write) Memory Mapped I/O (mmap) User Space Buffer (Process Heap) Kernel Page Cache (RAM) Physical Storage (SSD/Disk) 1. Disk DMA Copy 2. CPU Memory Copy Virtual Address Space (Page Table) Kernel Page Cache (Shared RAM) Physical Storage (SSD/Disk) 1. Disk DMA Copy Direct Memory Pointer (Zero-Copy)

In the standard model:

  1. The kernel performs a direct memory access copy to read the file blocks from disk into the Page Cache.
  2. The read() call copies that data from the Page Cache into the application's user space buffer.

This copies the same data twice and duplicates it in memory.

By contrast, memory mapping uses a zero-copy model. Because the process's page tables point directly to the Page Cache pages, accessing the pointer reads the cache memory directly. This avoids the second copy step, reducing CPU and memory overhead.


Page Faults and Demand Paging

When an application calls mmap(), the operating system does not immediately load the file contents into RAM. Instead, it uses demand paging.

The OS configures the virtual memory mapping but marks the pages as not present in the hardware page tables. The file data is loaded only when the process attempts to read or write to that specific memory address range:

  1. The Trap: The CPU's Memory Management Unit (MMU) intercepts the access attempt and generates a page fault exception, halting the thread.
  2. Page Check: The OS kernel handles the page fault and determines which file block maps to that virtual address.
  3. Hard Page Fault: If the requested page is not in RAM, the kernel performs disk I/O to load the file block into a physical memory page.
  4. Soft Page Fault: If the page is already in RAM (cached by another process), the kernel skips the disk read and updates the page table to point to it.
  5. Resume: The kernel updates the page table flags, and the CPU resumes executing the process instruction.

To prevent memory exhaustion, the kernel runs page eviction algorithms (like Least Recently Used or Least Frequently Used) to swap inactive pages out and free up RAM.


Shared vs. Private Mappings

When calling mmap(), you must specify a mapping type flag that controls how edits are shared:

  • MAP_SHARED: Edits made to the mapped memory pointer are shared with other processes mapping the same file. The updates are directly written to the kernel's Page Cache, making them visible to other processes and eventually flushed to disk.
  • MAP_PRIVATE: Writes to this region use a copy-on-write mechanism. When the process modifies a page, the kernel makes a private copy of it in RAM. The edits are visible only to the writing process and are never written back to the underlying file.

Advisory Declarations via madvise

Applications can optimize memory-mapped I/O performance by declaring their expected access patterns using the madvise() system call:

c
int madvise(void *addr, size_t length, int advice);

This call guides the kernel's prefetching and caching decisions:

  • MADV_SEQUENTIAL: Tells the kernel to prefetch upcoming pages, optimizing sequential reads.
  • MADV_RANDOM: Disables read-ahead optimizations, preventing the kernel from wasting disk I/O on pages that might not be accessed.
  • MADV_WILLNEED: Instructs the kernel to begin loading the mapped page range into memory immediately, reducing page fault latency during runtime.

Bypassing the Cache with O_DIRECT

While the Page Cache improves performance for most workloads, it can introduce overhead for storage engines like databases that manage their own application-level caching.

To bypass the Page Cache, applications can open files using the O_DIRECT flag:

c
int fd = open("database.db", O_RDWR | O_DIRECT);

When executing writes or reads on a file opened with O_DIRECT, data is transferred directly between physical disk blocks and the application's user space buffer, bypassing the kernel's Page Cache.

This gives database engines precise control over memory layouts and write timing, though it requires all user buffers and transfers to align with the physical block boundaries of the storage device.


Further Reading

Code Examples

Core Literature References

The Linux Programming Interface

by Michael Kerrisk — Chapter 13: File I/O Buffering, Chapter 49: Memory Mappings, pp. 243-264, 1017-1056

View source

Linux System Programming

by Robert Love — Chapter 4: Advanced File I/O, pp. 111-140

View source