Horizontal vs Vertical Scaling

The architectural differences, performance limits, and state-management tradeoffs of scaling systems up vs scaling out.

BeginnerReliability & ScaleChapter: Reliability & Scalability10 min read

The Scale-Up Path: Vertical Scaling

When an application starts slowing down due to resource saturation, the simplest path to recover performance is vertical scaling (scaling up). Scaling vertically means adding more hardware capacity to the existing single machine: upgrading to a faster CPU, adding more RAM, or transitioning to faster NVMe storage.

Vertical scaling is highly convenient because it requires zero changes to your application's software architecture. The application codebase continues to execute inside a single memory space, and database writes remain simple.

The Hardware Bottleneck: Limits of Vertical Scaling

While scaling up requires low engineering effort, it faces strict physical and economic boundaries:

Hardware Cost Curves: Upgrading from a standard server to a high-end enterprise machine does not scale linearly in price. A server with 8 times the RAM can cost 30 times more.
Hard Physical Limits: You will eventually hit the maximum physical limits of CPU sockets, motherboard bus speeds, and memory slots.
Single Point of Failure (SPOF): A single vertically scaled instance remains a single point of failure. If the power supply fails, the operating system kernel panics, or the datacenter loses connectivity, your entire system goes offline.
Resource Contention: As the number of CPU cores grows inside a single operating system, the system encounters locking bottlenecks on shared resources like database locks, network card buffers, and memory channels.

The Scale-Out Path: Horizontal Scaling

Horizontal scaling (scaling out) increases capacity by adding more physical or virtual machines to your infrastructure cluster. Instead of running a single high-performance machine, you distribute the system's workload across multiple smaller, inexpensive nodes.

Scaling horizontally removes the cost bottleneck and increases availability: if one machine fails, the other instances continue to process requests, eliminating single points of failure.

Challenges of Horizontal Scaling: Managing Shared State

Although scaling out is highly resilient, it shifts the complexity to software design:

Load Balancing: The system needs a load balancer (such as NGINX or AWS ALB) to distribute incoming traffic.
Stateless Requirements: Applications must be stateless. If an instance stores user session states in local RAM, subsequent requests routed to other instances will fail. The system must offload state to shared datastores (such as Redis or Postgres) or use stateless protocols (such as JWT tokens).
Network Latency: Communicating across network boundaries between nodes introduces latency overhead that does not exist in single-machine in-memory calls.

Scaling the Database: Read Replicas vs Sharding

While application servers are easily scaled out by spinning up stateless instances, scaling stateful databases is much harder:

Read Replicas: If your database is read-heavy, you can scale horizontally by replicating data from a primary write node to multiple read replicas. Writes still go to the primary node, while reads are distributed across replicas.
Database Sharding: If writes are the bottleneck, replication is not enough. You must implement database sharding, which partitions your database tables horizontally by a shard key (such as user_id) and distributes different rows across separate physical database instances.

Shared-Nothing vs Shared-Disk Architectures

Distributed databases are generally categorized into two architectural models:

Shared-Disk Architecture: All compute nodes (servers) access a single central storage system (like an Amazon EBS volume or SAN). It simplifies database management, but the central storage system eventually becomes a scalability bottleneck.
Shared-Nothing Architecture: Each node in the cluster possesses its own CPU, memory, and disk storage. Nodes communicate solely by exchanging messages over the network. This eliminates single storage bottlenecks, enabling almost infinite horizontal scaling.

Autoscaling: Dynamic Elasticity

One of the greatest benefits of horizontal scaling is autoscaling (elasticity).

Instead of provisioning maximum capacity beforehand, systems use orchestration platforms (such as Kubernetes or AWS Auto Scaling Groups) to monitor metrics like CPU utilization or queue depth. When traffic spikes, new virtual nodes spin up automatically to process the load, and they terminate when traffic drops, optimizing infrastructure costs.

Prerequisites

Load Balancing & Service Discovery

Code Examples

Core Literature References

The Art of Scalability: Scalable Application Architectures, Scale Rules, and Strategies for the Growing Enterprise

by Martin L. Abbott & Michael T. Fisher — Chapter 1: The Scale Cube, Chapter 2: Scaling the Architecture, pp. 12-45

View source

Continue learning

ACID & Isolation Levels

Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.

API Gateways

Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.

API Security & OAuth 2.0

Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.