WebSockets & SSE
Compare persistent bidirectional WebSockets and unidirectional Server-Sent Events (SSE), including handshakes, wire framing, and infrastructure scaling.
The Concept
Traditionally, client-server web applications communicate using a request-response model where the client initiates all exchanges. If a client needs real-time state updates from the server, it must use polling (sending regular HTTP requests, e.g. every 2 seconds) or long polling (the server holds the request open until new data is available). Both options introduce high network overhead due to repeatedly parsing headers and establishing TCP sessions.
To solve this, two main persistent push technologies are used on the web:
- WebSockets: A persistent, full-duplex TCP-based connection enabling simultaneous bidirectional binary and text frame exchange.
- Server-Sent Events (SSE): A lightweight, unidirectional connection stream using the standard HTTP protocol to push text data from server to client.
<svg viewBox="0 0 580 340" xmlns="http://www.w3.org/2000/svg" style="background-color: var(--color-surface-muted, #1f2428); border-radius: 0.75rem; border: 1px solid var(--color-border); padding: 1rem; width: 100%;">
<text x="145" y="25" fill="#88c0d0" font-family="sans-serif" font-size="12" font-weight="bold" text-anchor="middle">WebSocket (Bidirectional, Duplex)</text>
<line x1="60" y1="60" x2="60" y2="300" stroke="#4c566a" stroke-width="1.5"/>
<line x1="220" y1="60" x2="220" y2="300" stroke="#4c566a" stroke-width="1.5"/>
<text x="60" y="52" fill="#eceff4" font-family="sans-serif" font-size="9" text-anchor="middle">Client</text>
<text x="220" y="52" fill="#eceff4" font-family="sans-serif" font-size="9" text-anchor="middle">Server</text>
<path d="M 60 80 L 220 100" stroke="#eceff4" stroke-width="1" marker-end="url(#arr-gray)"/>
<text x="140" y="86" fill="#81a1c1" font-family="sans-serif" font-size="8" text-anchor="middle">HTTP Upgrade Handshake</text>
<path d="M 220 110 L 60 130" stroke="#a3be8c" stroke-width="1.2" marker-end="url(#arr-green)"/>
<text x="140" y="116" fill="#a3be8c" font-family="sans-serif" font-size="8" text-anchor="middle">101 Switching Protocols</text>
<path d="M 60 160 L 220 180" stroke="#88c0d0" stroke-width="1.5" marker-end="url(#arr-blue)"/>
<text x="140" y="166" fill="#88c0d0" font-family="sans-serif" font-size="8" text-anchor="middle">Frame (Client -> Server)</text>
<path d="M 220 200 L 60 220" stroke="#88c0d0" stroke-width="1.5" marker-end="url(#arr-blue)"/>
<text x="140" y="206" fill="#88c0d0" font-family="sans-serif" font-size="8" text-anchor="middle">Frame (Server -> Client)</text>
<path d="M 60 250 L 220 270" stroke="#ebcb8b" stroke-width="1" marker-end="url(#arr-yellow)"/>
<text x="140" y="256" fill="#ebcb8b" font-family="sans-serif" font-size="8" text-anchor="middle">Ping Frame (Keep-Alive)</text>
<path d="M 220 275 L 60 295" stroke="#ebcb8b" stroke-width="1" marker-end="url(#arr-yellow)"/>
<text x="140" y="281" fill="#ebcb8b" font-family="sans-serif" font-size="8" text-anchor="middle">Pong Frame (Ack)</text>
<text x="435" y="25" fill="#88c0d0" font-family="sans-serif" font-size="12" font-weight="bold" text-anchor="middle">Server-Sent Events (Unidirectional)</text>
<line x1="350" y1="60" x2="350" y2="300" stroke="#4c566a" stroke-width="1.5"/>
<line x1="510" y1="60" x2="510" y2="300" stroke="#4c566a" stroke-width="1.5"/>
<text x="350" y="52" fill="#eceff4" font-family="sans-serif" font-size="9" text-anchor="middle">Client</text>
<text x="510" y="52" fill="#eceff4" font-family="sans-serif" font-size="9" text-anchor="middle">Server</text>
<path d="M 350 80 L 510 100" stroke="#eceff4" stroke-width="1" marker-end="url(#arr-gray)"/>
<text x="430" y="86" fill="#81a1c1" font-family="sans-serif" font-size="8" text-anchor="middle">GET (Accept: text/event-stream)</text>
<path d="M 510 110 L 350 130" stroke="#a3be8c" stroke-width="1.2" marker-end="url(#arr-green)"/>
<text x="430" y="116" fill="#a3be8c" font-family="sans-serif" font-size="8" text-anchor="middle">200 OK (Keep-Alive, Chunked)</text>
<path d="M 510 170 L 350 190" stroke="#88c0d0" stroke-width="1.5" marker-end="url(#arr-blue)"/>
<text x="430" y="176" fill="#88c0d0" font-family="sans-serif" font-size="8" text-anchor="middle">event: message \n data: {...}</text>
<path d="M 510 220 L 350 240" stroke="#88c0d0" stroke-width="1.5" marker-end="url(#arr-blue)"/>
<text x="430" y="226" fill="#88c0d0" font-family="sans-serif" font-size="8" text-anchor="middle">event: update \n data: {...}</text>
<path d="M 510 270 L 350 290" stroke="#ebcb8b" stroke-width="1" marker-end="url(#arr-yellow)"/>
<text x="430" y="276" fill="#ebcb8b" font-family="sans-serif" font-size="8" text-anchor="middle">:heartbeat comment (Keep-Alive)</text>
<defs>
<marker id="arr-gray" markerWidth="6" markerHeight="6" refX="5" refY="3" orient="auto">
<path d="M0,0 L0,6 L6,3 z" fill="#eceff4"/>
</marker>
<marker id="arr-green" markerWidth="6" markerHeight="6" refX="5" refY="3" orient="auto">
<path d="M0,0 L0,6 L6,3 z" fill="#a3be8c"/>
</marker>
<marker id="arr-blue" markerWidth="6" markerHeight="6" refX="5" refY="3" orient="auto">
<path d="M0,0 L0,6 L6,3 z" fill="#88c0d0"/>
</marker>
<marker id="arr-yellow" markerWidth="6" markerHeight="6" refX="5" refY="3" orient="auto">
<path d="M0,0 L0,6 L6,3 z" fill="#ebcb8b"/>
</marker>
</defs>
</svg>
Practical Analogy
Think of these persistent connection options as phone configurations:
- HTTP Polling is like calling your delivery agent every 3 minutes to ask, "Is my package here yet?" This wastes time, dial tones, and cellular bandwidth.
- Server-Sent Events is like calling a news reporter and telling them: "Read me updates as they happen." The reporter keeps the call active and periodically reads out breaking headlines. You can only listen; if you want to respond, you must call them back on a different line.
- WebSockets is like a standard phone call where both you and the speaker are active on the line. You can both talk at the same time over a single connection, exchanging brief messages dynamically.
WebSockets: The Handshake & Wire Format
A WebSocket connection begins as an HTTP request and is upgraded to a bidirectional TCP stream.
The Upgrade Handshake
The client initiates a connection with specialized HTTP upgrade headers:
GET /chat HTTP/1.1
Host: server.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
If the server supports WebSockets, it takes the Sec-WebSocket-Key value, appends a globally unique protocol identifier (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), computes the SHA-1 hash of this string, base64-encodes the result, and returns it as Sec-WebSocket-Accept in a 101 Switching Protocols response:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
At this point, the HTTP protocol is abandoned. The underlying TCP socket remains open, and both parties communicate using the WebSocket binary framing protocol.
Wire Framing & Masking
WebSocket data is structured into discrete binary frames to manage stream boundaries:
- FIN (1 bit): Indicates if this frame is the final fragment of a message.
- Opcode (4 bits): Dictates the payload type (e.g.
0x1for text,0x2for binary,0x8for close,0x9for ping,0xAfor pong). - Mask Flag (1 bit): Specifies if the payload is masked. All frames sent from client to server must be masked.
- Payload Length (7 bits): The length of the payload. If the value is 126, the next 2 bytes represent the length; if 127, the next 8 bytes represent the length.
- Masking Key (4 bytes): If the mask flag is 1, these 4 bytes are used to unmask the payload.
To prevent proxy cache poisoning attacks, the client must apply a byte-wise XOR mask to the payload using a random masking key. The server reverses this using:
DecodedPayload[i] = MaskedPayload[i] ^ MaskingKey[i % 4]
Server-Sent Events (SSE): Unidirectional Streaming
Server-Sent Events provides a lightweight, unidirectional push stream from the server over standard HTTP. Unlike WebSockets, it requires no protocol switching or binary framing parser.
The client opens a standard HTTP request using the browser's EventSource API, requesting the text/event-stream MIME type. The server leaves the connection open, sending responses using HTTP chunked transfer encoding.
The wire format for SSE is human-readable text:
event: message
id: 101
data: {"user": "Alice", "text": "Hello"}
event: user_join
id: 102
data: {"user": "Bob"}
Each message must be terminated by two consecutive newline characters (\n\n). If the connection drops, the browser automatically attempts to reconnect, sending the last received ID in the Last-Event-ID HTTP header, allowing the server to replay missed events.
Connection Persistence & Resiliency
To keep long-lived connections healthy, systems employ heartbeat mechanisms:
- WebSocket Heartbeats: The server sends a binary ping frame (
0x9) at a regular interval. The client must immediately respond with a pong frame (0xA). If no pong is received within a timeout period, the server terminates the dead connection. - SSE Heartbeats: The server periodically sends dummy event comments (e.g.
:keep-alive\n\n) to prevent intermediate proxies and load balancers from closing idle connections. - Reconnect Policies: Clients must execute randomized exponential backoff reconnect strategies. If thousands of clients lose connection simultaneously due to a network blip, a naive immediate reconnect policy will create a thundering herd problem, overloading the servers.
Infrastructure Scaling Challenges
Scaling persistent connection architectures is fundamentally different from scaling stateless HTTP endpoints:
- File Descriptor Exhaustion: Every active connection requires an open socket, consuming an OS file descriptor. Operating systems limit the number of file descriptors per process (often 1024 by default). Production servers must be tuned using
ulimit -nto support hundreds of thousands of concurrent connections. - Load Balancer Connection Timeouts: Reverse proxies (like Nginx, HAProxy, AWS ALB) often close idle connections after a default duration (e.g. 60 seconds). Intermediary timeouts must be reconfigured to allow long-lived connections, and heartbeats must fire frequently enough to reset proxy idle timers.
- Multi-Node State Synchronization: In a scaled, clustered environment, Client A may be connected to Server 1, while Client B is connected to Server 2. If Client A sends a message to Client B, Server 1 cannot route it directly because Client B's socket resides on Server 2. To resolve this, backends use a pub/sub backplane (e.g. Redis Pub/Sub, RabbitMQ) to broadcast message payloads across all server nodes, allowing the node holding the target socket to deliver the frame.
Further Reading
- RFC 6455: The WebSocket Protocol — Official specifications for the WebSocket wire format
- HTML5 Server-Sent Events Specification — W3C/WHATWG living standard for SSE
- Nginx Load Balancing WebSockets Guide — Configuration and optimization details
- Scaling WebSockets to Millions of Connections — Practical system tuning strategies for real-time backends
Prerequisites
Code Examples
Continue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.