SQL vs NoSQL Database Architectures
Understand the core trade-offs between relational database models and non-relational database architectures, including storage layouts, consistency models, scaling mechanisms, and query patterns.
The Problem: Data Scaling and Heterogeneity
Early software applications relied on a standard model for data storage: structured tables with predefined columns, managed by a relational engine. This worked perfectly when datasets fit on a single server and schemas were static.
However, as internet-scale systems emerged, applications had to handle petabytes of data, high-velocity stream events, and unstructured or semi-structured data formats. Scaling a traditional relational database horizontally across multiple servers became a complex challenge.
Because relational systems depend on strict schemas, relationships, and transactional guarantees, spreading data across a distributed cluster requires coordinating locks and schema updates. This introduces severe network bottlenecks. Modern backend systems must choose the right storage architecture: relational databases for transaction integrity, or non-relational databases for write scalability, schema flexibility, and partition tolerance.
Relational Database Architecture (SQL)
Relational databases represent data as relations, which are visually organized as tables containing rows and columns. This architecture is built on the mathematical principles of relational algebra and tuple relational calculus.
Strict Schemas and Normalization
Relational databases enforce a strict schema. Every row in a table must conform to the defined columns, data types, and integrity constraints. Data is structured using normalization to reduce redundancy. For example, instead of repeating customer details in every order record, a separate customer table is created, and orders reference customer records using foreign key constraints.
Transactional Guarantees (ACID)
SQL databases prioritize transactional safety through the ACID model:
- Atomicity: Every operation in a transaction succeeds, or the entire transaction is rolled back.
- Consistency: Transactions only transition the database from one valid state to another, preserving schema invariants and constraints.
- Isolation: Concurrent transactions execute without interfering with one another, preventing dirty or inconsistent reads.
- Durability: Committed data is written to non-volatile storage, ensuring it survives system crashes.
Relational databases typically use B+ Trees to index columns, which optimizes point lookups and range scans. They store data on disk in a row-oriented format, where all fields of a single record are stored contiguously.
Non-Relational Database Architectures (NoSQL)
Non-relational databases depart from the tabular relational model to support horizontal scale, dynamic schemas, and specialized access patterns. They are classified into four main categories:
1. Document Stores
Document databases store data as semi-structured, self-describing documents, typically in JSON or BSON formats.
- Key Characteristic: Documents are nested and hierarchical. Related data is stored inline within a single document rather than normalized across separate tables.
- Examples: MongoDB, Couchbase.
2. Key-Value Engines
Key-value stores operate as distributed hash tables.
- Key Characteristic: They store unstructured binary or text payloads mapped to a unique lookup key. They offer high-speed read/write performance but lack query capabilities on the value payload itself.
- Examples: Redis, DynamoDB (basic usage).
3. Wide-Column Stores
Wide-column databases organize data into rows containing dynamic columns.
- Key Characteristic: Data is stored using SSTables (Sorted String Tables) grouped by partition keys. They write incoming data to in-memory buffers called memtables before flushing them to disk as immutable SSTables, making them highly write-optimized.
- Examples: Apache Cassandra, ScyllaDB.
4. Graph Databases
Graph databases model data as nodes (entities), edges (relationships), and properties (attributes).
- Key Characteristic: Instead of calculating joins at query time, graph databases store direct relationships as pointers on disk. This enables rapid traversals of complex, interconnected datasets.
- Examples: Neo4j, Amazon Neptune.
Disk Layouts: Row-Oriented vs Column-Oriented
A critical architectural distinction is how databases layout bytes on disk. This layout dictates their performance profiles.
Row-Oriented Layouts (OLTP)
In a row-oriented database, records are stored as contiguous blocks of bytes on disk (e.g. Postgres heap files). This is optimized for Online Transaction Processing (OLTP).
- Trade-off: Adding, modifying, or reading an entire user profile is highly efficient because all columns for that row reside in the same disk block. However, calculating the average age of all users requires the database to read every single row into memory, loading unused columns like names and addresses into cache lines, creating massive I/O bottlenecks.
Column-Oriented Layouts (OLAP)
In a column-oriented database, values for a specific column are stored contiguously on disk. This is optimized for Online Analytical Processing (OLAP).
- Trade-off: To calculate an average age, the database only reads the contiguous array of values from the
agecolumn, bypassing all other fields entirely. This makes analytical scans and aggregation queries extremely fast and cache-friendly. However, writing a new record requires updating multiple disjoint column files on disk, making individual row writes slow and complex.
Consistency Models and Scaling Mechanics
The architectural paths of SQL and NoSQL databases split when scaling across multiple nodes.
Scaling Relational Databases
Relational databases scale vertically (adding more CPU, RAM, and disk storage to a single server). If they are distributed horizontally, they hit limits. To perform dynamic joins or enforce foreign key constraints, nodes must coordinate and share state over the network. This coordination leads to latency, which makes multi-node SQL writes difficult to scale. Relational databases typically scale reads by using replication, but writes generally route to a single primary node.
Scaling Non-Relational Databases
NoSQL databases scale horizontally by partitioning (sharding) data across independent nodes. They achieve this by utilizing a partition key. The partition key is hashed to determine which node stores the corresponding document or record. NoSQL databases bypass the relational join problem by denormalizing data: copying related data directly into the same partition, eliminating the need for cross-node network lookups.
CAP Consistency Options
During network partitions, distributed databases face trade-offs:
- Strict Consistency (CP): Relational databases and certain NoSQL clusters require write operations to reach a quorum across nodes. If a node is unreachable, the write fails, ensuring all nodes serve the exact same data.
- Eventual Consistency (AP): Many NoSQL systems prioritize availability. Nodes accept writes locally, even if they cannot communicate with other nodes. Over time, changes propagate through peer-to-peer gossip protocols or read-repair mechanisms. While data eventually syncs, clients may temporarily read stale records.
Storage Engine Indexes: B+ Trees vs LSM-Trees
Different databases use distinct storage index structures to prioritize read or write workloads:
- B+ Trees (Read-Optimized): Commonly used in relational databases. They maintain sorted, balanced node hierarchies. Point lookups and range scans are fast, costing
O(log n). However, writes are expensive because updating a record requires modifying random disk blocks on the fly, causing heavy disk write overhead. - LSM-Trees (Write-Optimized): Popular in wide-column and key-value databases. Instead of writing directly to disk files, writes append to an in-memory buffer called a memtable and a commit log. When the memtable fills, it is written to disk as an immutable SSTable. A background compaction process merges and de-duplicates these files. This design converts slow, random disk writes into high-speed sequential writes, making LSM-Trees highly efficient for write-heavy workloads.
Further Reading
- One Size Fits All: An Idea Whose Time Has Come and Gone — Michael Stonebraker's classic analysis of why standard relational engines cannot satisfy all modern data storage requirements.
- Designing Data-Intensive Applications — Martin Kleppmann's comprehensive textbook on database internals, storage engines, and distributed consensus.
- The Log-Structured Merge-Tree (LSM-Tree) — The foundational research paper outlining LSM-Tree architectures.
Prerequisites
Code Examples
Core Literature References
“One Size Fits All”: An Idea Whose Time Has Come and Gone
by Michael Stonebraker and Uğur Çetintemel — Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pp. 2-11
View sourceContinue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.