mmap & Page Cache
Learn how the operating system caches physical disk blocks in RAM and uses memory mapping to optimize file I/O operations.
The Operating System Page Cache
To reconcile the performance gap between fast physical RAM and slow storage devices, the operating system kernel implements a caching layer called the Page Cache.
When an application requests a file write or read operation:
- The kernel does not execute a round-trip operation directly to physical storage.
- Instead, it organizes system memory (RAM) into fixed-size segments called pages (usually 4 KB).
- The kernel copies requested data sectors from disk blocks into these memory pages, fulfilling future queries from memory.
By serving I/O requests from cache pages, the system avoids hardware latency. If the requested data block is found in RAM, it results in a cache hit. If the data must be loaded from storage, it is a cache miss, requiring the calling thread to wait while the sector is transferred.
Direct Memory Mapping via mmap
The traditional way to interact with files is through system calls like read() and write(). These functions require the application to manage data buffers in user space, which can introduce performance overhead.
The mmap() system call provides an alternative. Instead of copying data, it projects a file descriptor's blocks directly into the process's virtual address space.
┌─────────────────────────────────────┐
│ Virtual Address Space │
│ 0x7fff12000 -> [Mapped File Page] │
└──────────────────┬──────────────────┘
│
(Direct Address Translation)
▼
┌─────────────────────────────────────┐
│ Physical RAM Page Cache │
│ [ Data Buffer in Memory ] │
└──────────────────┬──────────────────┘
│
(Flushed by Kernel)
▼
┌─────────────────────────────────────┐
│ Physical Disk Blocks │
│ [ File Saved on SSD/HDD ] │
└─────────────────────────────────────┘
When you call mmap(), the kernel updates the process page tables to point to the file blocks, returning a memory pointer to the mapping's starting address.
The application can then read and write file data using standard memory pointer offsets, such as:
char data = mapped_pointer[offset];
The kernel manages the address translations behind the scenes, ensuring the physical pages are loaded and synchronized.
Synchronization Logic: Flusher Threads and msync
When an application writes to a memory-mapped file, it modifies the data page in RAM. The kernel flags this modified page as a dirty page.
These modified blocks are not immediately written to disk. The operating system uses an asynchronous flush mechanism:
- Kernel Flusher Threads: Background routines (such as
pdflush,flush, orkswapdin Linux) monitor dirty pages. - Dirty Page Limits: When dirty pages exceed a configurable percentage of system memory, or when they remain modified longer than a timeout limit (often 30 seconds), these threads write the changes back to storage.
To enforce synchronization and ensure updates are safely committed to disk, applications can call:
int msync(void *addr, size_t length, int flags);
The flags parameter controls the synchronization behavior:
MS_ASYNC: Schedules the writeback but returns immediately.MS_SYNC: Blocks the calling thread until the dirty pages are written to disk.
The Double-Buffering Overhead
Using standard file I/O operations like read() can introduce memory overhead due to a phenomenon called double-buffering.
In the standard model:
- The kernel performs a direct memory access copy to read the file blocks from disk into the Page Cache.
- The
read()call copies that data from the Page Cache into the application's user space buffer.
This copies the same data twice and duplicates it in memory.
By contrast, memory mapping uses a zero-copy model. Because the process's page tables point directly to the Page Cache pages, accessing the pointer reads the cache memory directly. This avoids the second copy step, reducing CPU and memory overhead.
Page Faults and Demand Paging
When an application calls mmap(), the operating system does not immediately load the file contents into RAM. Instead, it uses demand paging.
The OS configures the virtual memory mapping but marks the pages as not present in the hardware page tables. The file data is loaded only when the process attempts to read or write to that specific memory address range:
- The Trap: The CPU's Memory Management Unit (MMU) intercepts the access attempt and generates a page fault exception, halting the thread.
- Page Check: The OS kernel handles the page fault and determines which file block maps to that virtual address.
- Hard Page Fault: If the requested page is not in RAM, the kernel performs disk I/O to load the file block into a physical memory page.
- Soft Page Fault: If the page is already in RAM (cached by another process), the kernel skips the disk read and updates the page table to point to it.
- Resume: The kernel updates the page table flags, and the CPU resumes executing the process instruction.
To prevent memory exhaustion, the kernel runs page eviction algorithms (like Least Recently Used or Least Frequently Used) to swap inactive pages out and free up RAM.
Advisory Declarations via madvise
Applications can optimize memory-mapped I/O performance by declaring their expected access patterns using the madvise() system call:
int madvise(void *addr, size_t length, int advice);
This call guides the kernel's prefetching and caching decisions:
MADV_SEQUENTIAL: Tells the kernel to prefetch upcoming pages, optimizing sequential reads.MADV_RANDOM: Disables read-ahead optimizations, preventing the kernel from wasting disk I/O on pages that might not be accessed.MADV_WILLNEED: Instructs the kernel to begin loading the mapped page range into memory immediately, reducing page fault latency during runtime.
Bypassing the Cache with O_DIRECT
While the Page Cache improves performance for most workloads, it can introduce overhead for storage engines like databases that manage their own application-level caching.
To bypass the Page Cache, applications can open files using the O_DIRECT flag:
int fd = open("database.db", O_RDWR | O_DIRECT);
When executing writes or reads on a file opened with O_DIRECT, data is transferred directly between physical disk blocks and the application's user space buffer, bypassing the kernel's Page Cache.
This gives database engines precise control over memory layouts and write timing, though it requires all user buffers and transfers to align with the physical block boundaries of the storage device.
Further Reading
- The Linux Programming Interface — Chapter 13 covers I/O buffering; Chapter 49 details memory mapping flags and architecture.
- Linux System Programming — Chapter 4 provides practical guidelines on memory-mapped file search engines.
- What Every Programmer Should Know About Memory — Ulrich Drepper's guide to hardware layouts and caching.
- Linux Kernel Page Cache Documentation — Explains virtual memory management parameters like dirty page ratios.
Prerequisites
Code Examples
Core Literature References
The Linux Programming Interface
by Michael Kerrisk — Chapter 13: File I/O Buffering, Chapter 49: Memory Mappings, pp. 243-264, 1017-1056
View sourceContinue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.