Processes & Threads

Understand how operating systems run programs using isolated processes and lightweight threads, and when to use each.

BeginnerFoundationsChapter: Foundations10 min read

What is a process?

A process is an isolated, running instance of a program. The operating system gives every process its own private memory space, its own set of open file descriptors, its own program counter, and its own list of signals. One process cannot accidentally read or write another process's memory — the OS enforces this boundary using virtual memory.

Think of a process like a separate restaurant kitchen. Each kitchen has its own equipment, its own pantry, and its own entrance. A fire in one kitchen does not spread to the next.

When you type node server.js in a terminal, the OS creates a new process: it loads the program from disk, allocates heap memory, sets up the stack, and hands execution to the first instruction.

What is a thread?

A thread is a unit of execution that lives inside a process. Threads within the same process share the process's heap memory, file descriptors, and global variables — but each thread maintains its own stack and program counter.

Extending the kitchen analogy: threads are the chefs inside that kitchen. They all work in the same room, use the same fridge and stovetop, but each chef is preparing a different dish simultaneously.

A single-threaded program runs one instruction at a time. A multi-threaded program can run many threads concurrently, all sharing the same address space.

Process vs thread anatomy

The diagram below shows two processes side by side (each fully isolated) and then zooms into one process with three threads sharing its heap but each owning a private stack:

Process vs Thread Memory Layout Process A Code Segment Heap (private) Stack (main thread) File Descriptors Signal Handlers Process B Code Segment Heap (private) Stack (main thread) File Descriptors Signal Handlers Isolated Process C (3 threads) Shared Heap T1 Stack + PC T2 Stack + PC T3 Stack + PC Shared File Descriptors Shared Per-thread Processes are fully isolated. Threads share heap but each has its own stack and program counter (PC).

Context switching

The OS can run far more threads than there are CPU cores. It does this by rapidly context switching: pausing one thread, saving its state (registers, program counter, stack pointer) into the kernel, and restoring another thread's saved state so it can run.

This is not free. Each switch involves a kernel entry, cache pollution, and TLB flushes. A system with thousands of OS threads in contention can spend more time switching than doing real work — a condition called thrashing. This is one reason why languages like Go invented green threads (goroutines) instead of mapping every goroutine directly to an OS thread.

CPU-bound vs I/O-bound work

The right concurrency model depends on what your code spends time doing:

  • CPU-bound work (image processing, cryptography, machine learning inference) keeps the CPU busy doing computation. More threads than cores does not help — you need true parallelism across cores. The right tool is multiple processes or threads pinned to individual cores.
  • I/O-bound work (database queries, HTTP calls, reading from disk) spends most of its time waiting. The CPU sits idle. You can handle thousands of simultaneous I/O operations with far fewer threads by using async I/O or a thread pool, because waiting costs nothing in terms of CPU cycles.

A web server that calls a database on every request is almost entirely I/O-bound — which is why Node.js (single-threaded event loop) and Nginx (non-blocking I/O) can handle enormous request volumes on modest hardware.

Concurrency vs parallelism

These two terms are often used interchangeably but they mean different things:

  • Concurrency is the ability to deal with many things at once. A single chef who switches between chopping vegetables, stirring soup, and checking the oven is working concurrently — only one hand moves at a time, but multiple tasks make progress.
  • Parallelism is actually doing many things at the same instant. Multiple chefs each working on a different dish simultaneously is parallel work.

A single-core machine can be concurrent (via time-slicing) but cannot be truly parallel. A multi-core machine can be both.

The Python GIL

Python's Global Interpreter Lock (GIL) is a mutex inside CPython that allows only one thread to execute Python bytecode at a time, even on a multi-core machine. This means Python threads cannot achieve true CPU parallelism for compute-intensive work.

Workarounds include:

  • Using multiprocessing (spawns separate processes, each with its own GIL).
  • Running native C extensions that release the GIL (NumPy does this).
  • Switching to an alternative runtime: PyPy partially relaxes this, and Python 3.13 ships with an experimental free-threaded (no-GIL) mode.

Go and Rust do not have a GIL. Go's runtime scheduler multiplexes goroutines across multiple OS threads freely, and Rust's ownership model enforces thread safety at compile time.

Green threads and goroutines

Green threads are threads implemented entirely in user space, managed by a runtime rather than the OS kernel. They are far cheaper to create and switch between than OS threads because no kernel call is required.

Go's goroutines are the most widely known example. Starting a goroutine costs only a few kilobytes of stack (which grows dynamically), and Go's runtime multiplexes all goroutines onto a pool of OS threads using an M:N scheduler (M goroutines on N OS threads). A production Go service routinely runs hundreds of thousands of goroutines on a handful of OS threads.

go
// Each "go" keyword spawns a goroutine — cheap enough to spawn per request
go handleConnection(conn)

Node.js takes a different approach: instead of green threads it uses a single-threaded event loop backed by libuv's thread pool for blocking I/O. JavaScript code never runs in parallel, but I/O callbacks are managed asynchronously.

Thread safety and race conditions

When two threads share mutable state without coordination, you get race conditions: the result depends on the exact interleaving of operations, which is non-deterministic.

Example: two threads both read a counter value of 100, both increment it, and both write back 101. The expected result was 102. This is a classic lost-update race.

Solutions include:

  • Mutexes (sync.Mutex in Go, pthread_mutex_t in C) — only one thread holds the lock at a time.
  • Channels (Go) — threads communicate by sending values instead of sharing memory. "Do not communicate by sharing memory; share memory by communicating."
  • Atomic operations — hardware-level compare-and-swap instructions for simple counters.
  • Immutability — if shared data never changes, no synchronisation is needed.

Go's race detector (go run -race) instruments your binary at runtime and reports any detected data races — an invaluable tool during development.

Code example

go
package main

import (
  "fmt"
  "sync"
)

func worker(id int, wg *sync.WaitGroup) {
  defer wg.Done()
  fmt.Printf("Worker %d starting\n", id)
  // simulate work
  fmt.Printf("Worker %d done\n", id)
}

func main() {
  var wg sync.WaitGroup
  for i := 1; i <= 5; i++ {
    wg.Add(1)
    go worker(i, &wg) // each go keyword spawns a goroutine
  }
  wg.Wait() // block until all goroutines finish
}

sync.WaitGroup is a counter: Add(1) increments it, Done() decrements it, and Wait() blocks until it reaches zero. Without it, main() would exit before the goroutines finish.

bash
# List all running processes
ps aux

# Show threads inside a specific process (Linux)
ps -T -p <pid>

# Check how many OS threads a Go binary is using
cat /proc/<pid>/status | grep Threads

Further Reading

Prerequisites

Code Examples