Load Balancing & Service Discovery

Load balancing distributes incoming traffic across multiple backend servers, working alongside service discovery registries to dynamically track available endpoints in elastic environments.

IntermediateReliability & ScaleChapter: Reliability & Scalability12 min read

The Need for Load Balancing

In modern systems, a single server cannot scale infinitely. When CPU, memory, or network bandwidth boundaries are hit, systems must scale out horizontally by adding more servers.

To act as a unified service, a horizontal cluster requires a coordinator to distribute client requests across the pool of available servers. This is the role of the load balancer. A load balancer distributes incoming network traffic, prevents server overload, and isolates client applications from backend infrastructure failures.

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different levels of the OSI network stack, yielding distinct capabilities and performance characteristics:

Layer 4 (Transport Layer)

Layer 4 load balancers route traffic based on packet headers without looking inside the application payload. They look only at transport protocols (TCP or UDP) and address details (source/destination IP addresses and ports).

No TCP Termination: The load balancer does not terminate the TCP connection, it simply redirects packets using network address translation (NAT). The TCP connection is established directly between the client and the backend server.
Pros: Extremely fast, low CPU overhead, handles millions of concurrent connections.
Cons: Cannot inspect application protocol details (like HTTP paths or headers), cannot perform cookie-based sticky sessions or path-based routing.

Layer 7 (Application Layer)

Layer 7 load balancers route traffic based on application data, including HTTP headers, cookies, query parameters, or paths.

TCP Termination: The load balancer terminates the incoming client TCP connection, reads the HTTP payload, selects a backend server, and establishes a separate TCP connection to forward the request.
Pros: Highly flexible, supports path-based routing (e.g. routing /api/users to Service A and /api/products to Service B), supports SSL termination, headers inspection, and smart rate-limiting.
Cons: Higher CPU and memory consumption per connection due to TCP termination and payload parsing.

Diagram: Layer 4 vs Layer 7 Traffic Routing

The following diagram illustrates how Layer 4 and Layer 7 load balancers process and route connections to backend servers:

Load Balancing Routing Algorithms

Load balancers distribute connections based on configured routing policies:

Round Robin: Routes requests sequentially down the list of servers. Simple but assumes all backend servers have equal capacity.
Weighted Round Robin: Assigns a weight metric to each server based on hardware capacity, routing a proportionally higher volume of requests to stronger machines.
Least Connections: Tracks active concurrent connections and routes new requests to the backend server with the lowest connection load.
Consistent Hashing: Hashes request attributes (like client IP or session cookie ID) to map queries to specific servers on a hash ring. Ensures users stick to the same server (sticky sessions) while keeping partitions uniform.

Health Checking and Target Resiliency

To prevent black-holing traffic (sending requests to a dead server), load balancers actively monitor target health:

Active Health Checking: The load balancer periodically sends a request (e.g. an HTTP GET /health request) to backend servers. If a server fails several checks consecutively, it is removed from the active routing pool.
Passive Health Checking: The load balancer intercepts real application traffic. If connection timeouts or 5xx errors spike on a server during standard operations, the balancer circuit-breaks traffic to that instance.

Dynamic Service Discovery and Registries

In cloud-native or containerized environments, IP addresses are ephemeral, containers spin up and shut down on demand, and server numbers scale dynamically based on demand. A static list of IPs in a load balancer configuration will quickly break.

To solve this, architectures use Service Discovery. This system is comprised of:

Service Registry: A centralized, highly available database (like Consul, ZooKeeper, or Eureka) tracking the state and IP coordinates of all running service instances.
Registration: When a backend container starts, it registers its IP address, port, and health check path with the service registry.
Discovery: When a client (or load balancer) wants to call a service, it queries the registry to obtain a list of currently active IP addresses.

DNS-based Service Discovery

Some infrastructures use DNS SRV records for discovery. However, standard DNS has limitations in elastic environments due to client-side DNS caching and TTL delays. If a container dies, clients might continue attempting to query the old IP until the TTL expires, whereas dedicated registries push instant configuration updates to load balancers.

Advanced L4 Routing: NAT, DSR, and VIPs

High-performance Layer 4 load balancers use specialized network routing paths to maximize throughput:

Network Address Translation (NAT): The load balancer modifies the destination IP of incoming packets to match the selected backend server. The backend response is routed back through the load balancer, which translates the source IP back to the Virtual IP (VIP) of the balancer.
Direct Server Return (DSR): The load balancer only routes incoming request packets, modifying the destination MAC address to target a backend server while leaving the VIP destination IP intact. The backend server is configured to accept traffic on the VIP and replies directly to the client, bypassing the load balancer entirely. This is highly efficient since incoming requests are typically small (KB) while responses are large (MB).

Prerequisites

Domain Name System (DNS)HTTP

Code Examples

Core Literature References

Computer Networking: A Top-Down Approach

by James F. Kurose and Keith W. Ross — Chapter 6: Load Balancing and Server Clusters, pp. 410-435

View source

Continue learning

ACID & Isolation Levels

Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.

API Gateways

Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.

API Security & OAuth 2.0

Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.