Caching Eviction Policies (LRU, LFU, FIFO)

Learn how caches manage finite memory by evicting items using FIFO, LRU, and LFU policies, and explore hybrid eviction strategies.

IntermediateCachingChapter: Caching Infrastructure12 min read

The Necessity of Eviction

A cache stores frequently accessed data in high-speed, expensive memory (like RAM) to bypass slow, downstream storage layers (like SSDs or remote databases). Because hardware budget constraints prevent us from storing the entire database in RAM, a cache operates with a finite memory buffer.

As application databases receive new records, the cache is filled to maximum capacity. Once full, any new write transaction requires the cache to make room by discarding (evicting) existing keys.

Choosing which keys to discard determines the cache hit ratio (the percentage of requests successfully served from cache memory), which directly impacts overall system latency and database load.

Real-World Analogy

Imagine a physical desk surface where you keep paper files. Your desk surface has limited space (the cache capacity).

A First-In, First-Out (FIFO) policy is like placing incoming folders in a neat stack, and when the desk is full, throwing away the oldest stack folder, even if you consult it every hour.
A Least Recently Used (LRU) policy is keeping the folders you currently touch close to you, while sliding the folder you haven't opened in weeks off the edge of the desk.
A Least Frequently Used (LFU) policy is keeping a tally mark on every folder. You evict folders with the lowest counts. However, folders from a project you completed last month might have hundreds of legacy tallies, preventing new files from finding space on your desk.

FIFO (First-In, First-Out) Eviction

The First-In, First-Out (FIFO) algorithm is the simplest eviction strategy. It operates on a queue structure, evicting items in the exact chronological order they were inserted.

text

       [Insert New Key] ──► [ Node E ] ──► [ Node D ] ──► [ Node C ] ──► [Evict oldest (C)]

Characteristics of FIFO

Low Metadata Overhead: FIFO only needs to track insertion timestamps or manage a basic linked list queue, requiring minimal memory allocation per cache entry.
Low Hit-Ratio Anomalies: FIFO does not account for how frequently or recently an item is accessed. A highly popular key requested thousands of times per second will still be evicted if it was inserted early.
Belady's Anomaly: For certain access patterns, increasing the cache's capacity under a FIFO policy can actually decrease the hit ratio, violating standard caching expectations.

LRU (Least Recently Used) Eviction

The Least Recently Used (LRU) algorithm evicts the key that has not been accessed for the longest duration. It assumes that if a key was accessed recently, it is highly likely to be accessed again in the near future (temporal locality).

Data Structure Mechanics

To achieve constant-time O(1) operations for both retrievals and updates, an LRU cache combines two data structures:

Hash Map: Maps keys to nodes in a doubly linked list, enabling O(1) point lookups.
Doubly Linked List: Maintains the recency order of the cache items. The head node represents the Most Recently Used (MRU) item, while the tail node represents the Least Recently Used (LRU) item.

When a key is read or updated via Get() or Put(), the cache looks up the node pointer using the hash map, detaches the node from its current linked list position, and moves it to the head of the list.

If a Put() call causes the list length to exceed the cache capacity, the node directly preceding the dummy tail (the LRU node) is unlinked from the list and its key is deleted from the hash map, completing eviction in O(1) time.

The Cache Pollution Issue

A primary limitation of LRU is its vulnerability to cache pollution under scanning queries (such as a database backup script reading every record sequentially).

A full table scan will load thousands of infrequently used records into the cache, moving them to the MRU head and evicting hot, frequently requested items. This causes the cache hit ratio to plummet.

LFU (Least Frequently Used) Eviction

The Least Frequently Used (LFU) algorithm evicts the key with the lowest access frequency. It assumes that items with high access counts are extremely likely to be needed in the future.

Mechanics and Data Structures

To implement LFU in O(1) time, systems use a double linked list of frequency buckets, where each bucket contains a doubly linked list of nodes with that exact access count.

When an item is accessed, its frequency counter increments, and its node is moved to the next higher frequency bucket.

The Frequency Starvation Issue

The primary downside of LFU is frequency starvation.

If a key accumulates a high frequency count during a brief, intense traffic spike (e.g. a viral marketing campaign), it will remain in the cache indefinitely even after the campaign ends and request volume drops to zero.

Because its historical count is high, it is protected from eviction, preventing newly active keys from entering the cache.

Hybrid Policies and Modern Optimizations

To resolve the limitations of plain LRU and LFU, modern caching engines employ hybrid algorithms:

LRU-2 / LRU-K: Tracks the time of the last K accesses (usually K=2). A key is not promoted to the high-priority cache partition on its first access, but only on its second access. This prevents scanning queries (single accesses) from polluting the cache.
2Q (Two Queue): Uses two distinct queues. Newly read items enter a FIFO queue (A1in). If they are evicted from A1in without being accessed again, their keys are stored in a ghost queue (A1out). If a key in A1out is accessed again, it is promoted to a main, long-term LRU queue (Am), successfully separating one-time reads from frequent reads.
W-TinyLFU: Used in modern libraries like Caffeine (Java) and Ristretto (Go). It maintains a tiny window cache (LRU) and a main cache. It uses a Bloom Filter-like structure (Count-Min Sketch) to estimate access frequency for incoming items, admitting them to the main cache only if their access frequency is higher than the eviction victim's frequency.

Time-To-Live (TTL) Eviction

Independent of access history, caches often enforce a Time-To-Live (TTL) value, which guarantees that data is evicted after a specific duration to prevent serving stale information. Caches clear expired TTL keys using two concurrent mechanisms:

Passive Expiration: When a client requests a key, the cache checks the key's expiration timestamp. If the current time is past the timestamp, the cache deletes the key and returns a cache miss.
Active Expiration: Because passive expiration can leave unused, expired keys occupying memory, a background thread runs periodic sweeps. This thread samples a random subset of keys with TTLs and evicts any that have expired, maintaining a balanced memory profile.

Workload Patterns and Hit Ratio Optimization

The effectiveness of an eviction policy depends heavily on the application's access workload:

Read-Heavy Zipfian Workloads: Most web applications follow a Zipfian distribution (the Pareto 80/20 rule, where a small subset of keys receive the vast majority of requests). LRU and LFU perform exceptionally well here, keeping the small set of hot keys in memory.
Write-Heavy / Streaming Workloads: When applications continuously write new data and read it only once, caching offers low utility. Here, strict TTLs or FIFO queues prevent the cache from wasting memory buffers.
Scanning Workloads: If the application frequently scans large ranges of key-value pairs, hybrid policies like 2Q or W-TinyLFU are critical to prevent total cache pollution.

Prerequisites

Caching Strategies

Code Examples

Core Literature References

The Art of Computer Programming, Volume 3: Sorting and Searching

by Donald E. Knuth — Chapter 6: Searching, pp. 392-421

View source

Continue learning

ACID & Isolation Levels

Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.

API Gateways

Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.

API Security & OAuth 2.0

Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.