File Descriptors
Learn how Unix represents every I/O resource as a file descriptor, why servers run out of them, and how event loops watch thousands at once.
Everything in Unix is a file
One of Unix's most elegant design decisions is that almost every I/O resource looks the same from a program's perspective: regular files, directories, network sockets, pipes, terminals, device drivers, and even inter-process communication channels all share the same read/write interface.
This is the "everything is a file" philosophy. Your program does not need special APIs for network sockets vs disk files — in both cases it calls read() and write() with an integer handle. That integer handle is a file descriptor (fd).
What is a file descriptor?
A file descriptor is a small non-negative integer that is an index into a per-process table maintained by the kernel. When you open a file, connect a socket, or create a pipe, the kernel:
- Creates (or reuses) a kernel-level file description object tracking the resource, its offset, and its flags.
- Adds a pointer to that object in the process's open file table.
- Returns the index of that slot to the process — that index is the file descriptor.
fd 0, 1, and 2 are always pre-opened by the OS for every process:
0— stdin: the standard input stream (usually your keyboard or a pipe).1— stdout: the standard output stream (usually the terminal).2— stderr: the standard error stream (also usually the terminal, but separate from stdout).
Every subsequent resource gets the next available integer: 3, 4, 5, and so on.
The fd table visualised
stdin, stdout, stderr, and shell redirection
Because stdout is just fd 1 pointing at the terminal, shell redirection simply swaps what fd 1 points to:
# Before: fd 1 -> terminal
echo "hello"
# After: fd 1 -> myfile.txt (kernel reopens the target and replaces the pointer)
echo "hello" > myfile.txt
Pipes work the same way. ls | grep foo creates an anonymous pipe, sets fd 1 of ls to the write end of the pipe, and sets fd 0 of grep to the read end. No temporary files involved.
When you write 2>&1 in a shell script, you are duplicating fd 2 to point at the same resource as fd 1 — so stderr and stdout both flow to the same destination.
Why servers run out of file descriptors
Every accepted TCP connection consumes one file descriptor. A busy HTTP server handling 10,000 simultaneous connections needs at least 10,000 open fds, plus fds for config files, log files, database connections, and more.
The OS enforces a per-process fd limit. On most Linux distributions the default soft limit is 1,024. When a process tries to open or accept beyond that limit, the syscall returns EMFILE — "too many open files". Applications surface this as errors like:
Error: EMFILE: too many open files
accept: too many open files
The fix is to raise the limit before the process starts:
# Check current soft and hard limits
ulimit -Sn # soft limit (enforced)
ulimit -Hn # hard limit (ceiling for the soft limit)
# Raise the soft limit to 65536 for this session
ulimit -n 65536
# For a system service, set in /etc/security/limits.conf:
# myapp soft nofile 65536
# myapp hard nofile 131072
Production systems running Nginx, Redis, or PostgreSQL routinely configure limits of 65,536 to 1,048,576.
I/O multiplexing: watching many fds at once
The naive approach to handling many connections is to assign one thread per connection. Each thread blocks on read(), waiting for its connection to produce data. With 10,000 connections you would need 10,000 threads — expensive to create, and most would be asleep at any moment.
I/O multiplexing solves this by letting a single thread ask the kernel: "wake me up when any of these fds has data ready." The key syscalls are:
select()— oldest, portable, but limited to 1,024 fds and slow (O(n)scan).poll()— removes the 1,024 limit but stillO(n).epoll()(Linux) /kqueue(macOS/BSD) —O(1)wakeup. The kernel maintains a set of watched fds and only returns the ones that are ready. Scales to hundreds of thousands of fds.
This is the foundation of every high-performance server:
- Node.js runs a single JavaScript thread backed by libuv, which uses
epoll/kqueueunder the hood. - Nginx uses an event loop per worker process, each watching thousands of connections with
epoll. - Go's net package uses non-blocking sockets and its runtime integrates with
epollso goroutines block at the Go scheduler level, not the OS thread level.
The mental model: instead of one thread per sleeping connection, you have one thread watching a scoreboard. The kernel taps the thread on the shoulder only when a connection has actual work to do.
Inspecting open file descriptors
On Linux, the /proc filesystem exposes every process's open fds as symbolic links:
# List all open fds for process 1234
ls -la /proc/1234/fd/
# Output looks like:
# lrwxrwxrwx 1 root root 64 Jun 11 09:00 0 -> /dev/pts/0
# lrwxrwxrwx 1 root root 64 Jun 11 09:00 1 -> /dev/pts/0
# lrwxrwxrwx 1 root root 64 Jun 11 09:00 3 -> /var/log/app.log
# lrwxrwxrwx 1 root root 64 Jun 11 09:00 4 -> socket:[12345]
# Count how many fds a process currently has open
ls /proc/1234/fd | wc -l
lsof (list open files) is the friendlier cross-platform tool:
# All open files for a process
lsof -p 1234
# Only network connections
lsof -p 1234 -i
# Find which process has a specific file open
lsof /var/log/app.log
fd leaks
An fd leak happens when code opens a file or socket but never closes it. The fd number is never returned to the pool. Over time the count climbs until it hits the limit and every subsequent open() or accept() fails.
Common causes:
- Returning early from a function before calling
close()— usedefer f.Close()in Go ortry/finallyin Python/Java. - Exception paths that skip cleanup.
- Forgetting to close the write end of a pipe after
fork().
func readConfig(path string) ([]byte, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close() // runs when the function returns, even on error paths
buf := make([]byte, 4096)
n, err := f.Read(buf)
if err != nil {
return nil, err
}
return buf[:n], nil
}
To diagnose a leak in a running process, watch the fd count grow over time:
watch -n 1 "ls /proc/<pid>/fd | wc -l"
If the number climbs monotonically and never falls, you have a leak. Use lsof -p <pid> to see which types of resources are accumulating.
Code example
# Check the fd limit for the current shell
ulimit -n
# Raise the soft limit to 65536 for the current session
ulimit -n 65536
# List all open fds for a running process
lsof -p <pid>
# Browse fd symlinks directly on Linux
ls -la /proc/<pid>/fd/
# Count open fds for a process
ls /proc/<pid>/fd | wc -l
Further Reading
- Everything is a file (Wikipedia) — history and scope of the Unix philosophy
- File descriptors explained (Julia Evans) — practical illustrations of how fds work in real programs
- How epoll works (Cindy Sridharan) — deep dive into the Linux epoll mechanism
- lsof manual (die.net) — complete reference for the
lsoftool - Linux file descriptor limits (Remy Sharp) — real-world Node.js production lessons on fd exhaustion
Prerequisites
Code Examples
Continue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.