System Design from First Principles

System Design Guide

The complete guide — from first principles to full system designs.

There is no single right answer in system design — everything has tradeoffs. This guide is meant as initial exposure to the core concepts, building blocks, and patterns that underpin every distributed system.

The guide is built from the ground up. If you know nothing about system design, start at Part 1. If you're already comfortable with the basics, skip ahead to Part 3 (Patterns & Principles) or Part 5 (Putting It All Together).

What This Guide Covers

Part 1 — First Principles: How the internet works — clients, servers, protocols, DNS, HTTP, TLS. The foundation everything else is built on.
Part 2 — Building Blocks: The components you assemble to build systems — servers, APIs, databases, caching, CDNs, load balancers, message queues, and virtual networks.
Part 3 — Patterns & Principles: The theoretical concepts and architectural patterns — ACID, CAP theorem, consistency models, replication, sharding, and scaling strategies.
Part 4 — Security: Authentication, authorization, encryption, and the common vulnerabilities every system must defend against.
Part 5 — Putting It All Together: A structured framework for designing systems end-to-end — requirements, estimation, API design, data modeling, and architecture.
Part 6 — Common Design Problems: Worked examples of real system designs — URL shorteners, chat apps, notification systems, and more.
Glossary: Quick-reference definitions for every term used throughout the guide.

Key insight: There is no perfect architecture. The goal is structured thinking, awareness of trade-offs, and the ability to reason about why one approach fits better than another for a given set of constraints.

Running Case Study: The LinkPulse Story
Parts 1–4 each end with a chapter of a running story: you and your co-founder Sam are building and scaling LinkPulse, a link-sharing platform. Each chapter picks up where the last left off. The concepts are taught first — the story shows you what they feel like in practice. Look for the gold-bordered sections at the end of each Part.

Part 1: First Principles — How the Internet Works

Before we talk about system design, you need to understand what happens when a user types a URL into their browser. Every system you'll ever design sits on top of this flow.

1.1 The Client-Server Model

Everything on the internet is a conversation between two types of machines:

Client: The thing making a request (your browser, a mobile app, another server)
Server: The thing responding to the request (a machine running code that listens for incoming connections)

When you type https://example.com into your browser:

1. Your browser (client) asks DNS: "What IP address is example.com?"
2. DNS responds: "It's 93.184.216.34"
3. Your browser opens a TCP connection to 104.26.10.53 on port 443 (HTTPS)
4. Your browser sends an HTTP GET request: "Give me the homepage"
5. The server processes the request and sends back HTML/CSS/JS
6. Your browser renders the page

Key concept: Every interaction in a distributed system is a client talking to a server. Servers can also be clients to other servers. For example, your web server might be a client to a database server.

1.2 What Is a Protocol?

A protocol is a set of rules that two machines agree to follow when communicating. Without a protocol, one machine might send data as JSON while the other expects XML — chaos. Think of it like two people agreeing to speak English, take turns, and raise a hand when they have a question.

Protocol	What It Does	Analogy
TCP	Reliable, ordered delivery of data	Certified mail — you get confirmation it arrived
UDP	Fast, unreliable delivery	Shouting across a room — fast but some words get lost
HTTP	Request-response format for web communication	A structured form: "I want X" → "Here's X"
HTTPS	HTTP + encryption (TLS)	Same form, but in a sealed envelope
WebSocket	Persistent two-way connection	A phone call (stays open, either side can talk)

TCP Deep Dive

TCP (Transmission Control Protocol) guarantees that data arrives completely, in order, and without errors. It does this through three mechanisms:

Acknowledgments (ACKs): The receiver confirms every chunk of data it gets. If the sender doesn't hear back, it resends.
Ordering: Every chunk gets a sequence number. If chunk #3 arrives before chunk #2, TCP holds #3 and waits for #2.
Retransmission: If a chunk is lost in transit, the sender automatically resends it after a timeout.

Before any data flows, TCP establishes a connection with a 3-way handshake:

UDP Deep Dive

UDP (User Datagram Protocol) skips the handshake entirely. It just fires data and hopes for the best — no ACKs, no ordering, no retransmission. This makes it much faster but unreliable.

Analogy: TCP is like a phone conversation ("Did you hear that?" "Yes." "OK, next thing..."). UDP is like a live radio broadcast — the station keeps transmitting whether you're listening or not.

TCP vs UDP — When does it matter?

TCP for anything where you can't lose data: web pages, API calls, file transfers, database queries
UDP for anything where speed matters more than perfection: video streaming, gaming, voice calls

But wait — if TCP is reliable, why would anyone use UDP? Because reliability has a cost. TCP's handshakes, ACKs, and retransmissions add latency. In a video call, a dropped frame from 50ms ago is worthless — you'd rather skip it and show the next frame. UDP lets you do that. TCP would pause the stream to re-request the lost frame, causing a visible stutter.

1.3 IP Addresses and Ports

An IP address is a machine's address on a network: 192.168.1.100

A port is like an apartment number at that address. One machine can run many services, each on a different port:

Port 80   → HTTP (web traffic)
Port 443  → HTTPS (encrypted web traffic)
Port 5432 → PostgreSQL (database)
Port 6379 → Redis (cache)
Port 3000 → Your Node.js app

Why separate ports for HTTP (80) and HTTPS (443)? It's a transport-layer distinction. Firewalls can block port 80 entirely to force all traffic through encrypted HTTPS on 443. Reverse proxies (like Nginx) often listen on both: accepting HTTPS on 443 and redirecting any port 80 traffic to 443. Analogy: a building with a public entrance (port 80) and a secured entrance with a guard (port 443) — organizations can lock the public entrance to ensure everyone goes through security.

But wait — how many ports can a server have? 65,535 (2¹⁶). Ports 0–1023 are "well-known" and reserved for standard services like HTTP and SSH. Ports 1024–49151 are "registered" for specific applications. Ports 49152–65535 are ephemeral — the OS assigns them to temporary client connections on the fly.

What if two applications try to use the same port? The OS rejects the second one with a "port already in use" error (EADDRINUSE). Each port can only be bound by one process at a time — it's like two tenants trying to rent the same apartment.

1.4 DNS: The Internet's Phone Book

DNS (Domain Name System) translates human-readable names to IP addresses.

example.com  →  93.184.216.34

Lookup chain:
1. Check browser cache
2. Check OS cache
3. Ask your ISP's DNS resolver
4. Resolver asks root nameserver → "Who handles .com?"
5. Ask .com nameserver → "Who handles example.com?"
6. Ask example.com's nameserver → "Here's the IP: 93.184.216.34"

Why this matters: DNS can be used for load balancing (returning different IPs for the same domain), geographic routing (returning the closest server's IP), and failover (changing the IP when a server goes down).

GeoDNS is DNS that returns different IP addresses based on where the query originates. Queries from Europe resolve to an EU load balancer IP; queries from Asia resolve to an AP load balancer IP. This is how global systems achieve low latency — a user in Frankfurt gets routed to the nearest data center, dropping response times from 280ms to 15ms.

How long does a DNS cache last? Each DNS record has a TTL (Time to Live) — typically 60–300 seconds. When the TTL expires, the resolver re-queries the authoritative nameserver. Low TTL means faster failover but more DNS traffic. High TTL means fewer lookups but slower propagation when you change an IP.

1.5 HTTP: The Language of the Web

HTTP is a request-response protocol. Every HTTP message has:

Request:
GET /api/users/123 HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbGciOiJ...
Content-Type: application/json

Response:
HTTP/1.1 200 OK
Content-Type: application/json

{
  "id": 123,
  "name": "John Doe",
  "role": "Senior Engineer"
}

HTTP Methods

Method	Purpose	Safe?	Idempotent?
`GET`	Read/retrieve data	Yes	Yes
`POST`	Create new data	No	No
`PUT`	Replace/update data completely	No	Yes
`PATCH`	Partially update data	No	No
`DELETE`	Remove data	No	Yes

Safe = doesn't change anything on the server. Idempotent = calling it 10 times has the same effect as calling it once.

Still confused by idempotency? Think of it like a light switch vs. a doorbell. A light switch is idempotent: flipping it to "on" ten times still results in the light being on — the end state is the same no matter how many times you do it. A doorbell is NOT idempotent: pressing it ten times rings it ten times. Each press creates a new side effect.

POST vs PUT vs PATCH: A Concrete Example

// Original user object on server:
{ "id": 42, "name": "John", "email": "john@example.com", "role": "editor" }

// POST /users — CREATE a new resource
// Every call creates a NEW user. Call it 3 times, get 3 users:
{ "name": "Jane", "email": "jane@example.com", "role": "viewer" }
// → 201 Created (id: 43)
// Call again → 201 Created (id: 44) ← duplicate! POST is NOT idempotent.

// PUT /users/42 — REPLACE the entire resource
// You MUST send the complete object:
{ "name": "John", "email": "john-new@example.com", "role": "editor" }
// ⚠️ If you forget "role": the field is ERASED (set to null)
// Call it 10 times → same result every time. PUT IS idempotent.

// PATCH /users/42 — UPDATE only specified fields
// Send only what changed:
{ "email": "john-new@example.com" }
// ✓ "name" and "role" are untouched

Why Idempotency Matters: The Retry Problem

Networks are unreliable. Requests fail, time out, and get retried — often automatically. The question idempotency answers is: "If this request gets sent twice by accident, will anything bad happen?"

Concrete scenario: You click "Place Order" on a website. Your browser sends a POST request to charge your credit card $50. The server processes the charge successfully, but the response gets lost in transit — your browser shows a spinner, then a timeout error. Did the charge go through? You don't know. You click "Place Order" again.

What happens with a non-idempotent POST: The server sees a brand-new request. It has no idea this is a retry. It processes a second charge. You've now been billed $100.

What happens with an idempotent operation: The server recognizes "I already processed this" and returns the original result. You're charged $50 — exactly once.

How to make non-idempotent operations safe: Use an idempotency key — a unique ID (like a UUID) the client generates and sends with the request. The server stores this key with the result. On retry, it sees the same key, skips re-processing, and returns the cached result. Stripe, PayPal, and most payment APIs require this.

Unsafe Retry (POST without idempotency key)

1. Client sends POST /charge → Server processes, charges $50

2. Response lost in transit → Client sees TIMEOUT

3. Client retries POST /charge → Server processes AGAIN, charges $50

Result: Customer charged $100

Safe Retry (POST with idempotency key)

1. Client sends POST /charge + Idempotency-Key: abc-123 → Server processes, charges $50, stores key

2. Response lost in transit → Client sees TIMEOUT

3. Client retries POST /charge + Idempotency-Key: abc-123 → Server finds key → returns cached result

Result: Customer charged $50

Quick idempotency cheat sheet:

Method	Idempotent?	Why
`GET`	Yes	Reading data never changes it. Read 100 times, same result.
`PUT`	Yes	"Set this resource to exactly this state." Doing it twice = same state.
`DELETE`	Yes	Deleting something that's already gone is a no-op.
`POST`	No	"Create a new thing." Each call creates another one.
`PATCH`	No	Depends on implementation — can have side effects on retry.

HTTP Status Codes

2xx = Success
  200 OK — Here's what you asked for
  201 Created — I made the thing you asked me to make
  204 No Content — Done, nothing to send back

3xx = Redirect
  301 Moved Permanently — This resource lives somewhere else now
  304 Not Modified — Use your cached version

4xx = Client Error (YOUR fault)
  400 Bad Request — Your request doesn't make sense
  401 Unauthorized — Who are you? Log in first
  403 Forbidden — I know who you are, but you can't do this
  404 Not Found — That doesn't exist
  429 Too Many Requests — Slow down (rate limiting)

5xx = Server Error (SERVER's fault)
  500 Internal Server Error — Something broke on our end
  502 Bad Gateway — The upstream server failed
  503 Service Unavailable — We're overloaded or in maintenance
  504 Gateway Timeout — The upstream server took too long

502 and 504: The Gateway Codes

These two codes specifically involve a gateway — a load balancer or reverse proxy sitting between the client and the backend server. The client never talks to the backend directly; the gateway forwards requests on its behalf.

Client ──► Gateway (Nginx/ALB) ──► Backend Server 502 Bad Gateway: Gateway received a response from the backend, but it was invalid or garbled. The backend is running but misbehaving. 504 Gateway Timeout: Gateway waited for the backend but got no response within the timeout window. The backend is too slow or unresponsive.

Analogy: You call a company's receptionist (gateway) who connects you to a specialist (backend). 502 = the specialist picks up but speaks gibberish. 504 = the specialist never picks up and the receptionist gives up waiting.

What about HTTP/2 and HTTP/3? HTTP/2 multiplexes multiple requests over a single TCP connection, eliminating head-of-line blocking at the HTTP layer. HTTP/3 goes further — it replaces TCP with QUIC (a UDP-based protocol), reducing connection setup from 2–3 round trips to just 1. Most CDNs and major sites use HTTP/2+ today. The progression is worth understanding, though the focus here is on HTTP/1.1 semantics (methods, headers, status codes) — those haven't changed.

1.6 HTTP is Stateless

Every HTTP request is completely independent. The server has no memory of previous requests. When you send request #2, the server has zero knowledge of request #1 — it's like talking to someone with amnesia every time.

But wait — if HTTP is stateless, how does the server know I'm logged in? Great question. The server doesn't "remember" you between requests. Instead, you send proof of identity with every single request — usually a tokenA string of characters that proves your identity. Think of it like a wristband at a concert — you show it at every door. or cookieA small piece of data that the browser automatically attaches to every request to a specific domain.. We'll cover exactly how this works in Part 4: Security.

Why stateless? It makes scaling easy. Since no server needs to "remember" anything, any server can handle any request. You can add 10 more servers behind a load balancer and they all work immediately — no shared session state needed.

1.7 TLS/HTTPS: Encryption in Transit

TLS (Transport Layer Security) is the protocol that puts the "S" in HTTPS. It encrypts data between the client and server so that anyone intercepting the traffic sees only gibberish.

Analogy: Imagine passing notes in class. HTTP is writing the note in plain English — anyone who intercepts it can read it. HTTPS is writing the note in a secret code that only you and the recipient understand, inside a sealed envelope.

The certificate proves the server is who it claims to be (not an imposter). It's issued by a trusted Certificate Authority (CA)A trusted organization (like Let's Encrypt, DigiCert) that verifies a server's identity and issues digital certificates..

Does TLS slow things down? The handshake adds ~1 round trip on the first connection. After that, TLS session resumption makes subsequent connections nearly free. The encryption/decryption overhead is negligible on modern hardware — HTTPS is effectively free performance-wise. Never skip TLS to "improve performance."

1.8 Making a Real HTTP Request

Here's what it looks like to make an HTTP request in code:

// JavaScript (browser or Node.js)
const response = await fetch('https://api.example.com/users/123', {
  method: 'GET',
  headers: {
    'Authorization': 'Bearer eyJhbGciOiJ...',
    'Content-Type': 'application/json'
  }
});
const user = await response.json();
console.log(user.name);  // "John Doe"

The equivalent using curl from the command line:

# curl (command line)
curl -X GET https://api.example.com/users/123 \
  -H "Authorization: Bearer eyJhbGciOiJ..." \
  -H "Content-Type: application/json"

Both do the exact same thing: send an HTTP GET request with an auth header and expect JSON back. The fetch() API is what your frontend JavaScript uses; curl is what you'll use for quick testing.

The LinkPulse Story

Chapter 1: Day One

It's launch day. You and Sam deploy LinkPulse — a link-sharing platform — on a single $20/month server in Virginia. The stack is simple: a Node.js API, a PostgreSQL database, and a React frontend. All on one machine.

Your first user signs up. Let's trace exactly what happens. They type linkpulse.io into their browser. First, a DNS lookup resolves the domain to your server's IP address. Then the browser opens a TCP connection — the three-way handshake you just learned about. Finally, the browser sends an HTTP GET request for the homepage. Your server receives it, renders the page, and sends back an HTTP response. The browser paints the page. It takes 120ms. It works.

Ten users sign up by lunch. Twenty by dinner. You're watching logs in real time, grinning.

Uh Oh

Day two. A user signs up from a coffee shop. They type their email and password into your registration form and click submit. What you didn't think about: your site is served over plain HTTP on port 80. That means their credentials cross the coffee shop's Wi-Fi in plaintext. Anyone on the same network running a packet sniffer can read their password.

You scramble. You grab a free TLS certificate from Let's Encrypt, configure your server to accept HTTPS on port 443, and redirect all HTTP traffic. Now every request is encrypted with TLS — even if someone intercepts the packets, they see gibberish. Crisis averted, lesson learned: encryption isn't optional, it's step one.

A week in, Sam asks a good question: "We have a login page now. But HTTP is stateless — the server forgets who you are after every request. How does the server know a user is logged in on the next click?" You tell her you're using a session cookie for now, but you know this question goes deeper. You'll revisit it properly in Part 4.

Uh Oh

Day ten. A user in Mumbai reports that the page takes 4 seconds to load. You check your server metrics — CPU is at 5%, response time is 15ms. The server is fast. So what's slow? Physics. A round trip from Mumbai to Virginia takes roughly 200ms. Your page loads dozens of assets — HTML, CSS, JavaScript, images, API calls — each one a separate round trip. Those 200ms add up fast across a TCP connection that starts slow (remember: TCP's congestion window starts small and grows). The speed of light is a hard limit, and your server is 13,000 km away.

Key Takeaway

LinkPulse works. But it only works well for people near Virginia. Every concept from Part 1 — DNS, TCP, HTTP, TLS — happens on every single request. And every request from Mumbai crosses the ocean. You haven't hit a code problem. You've hit a distance problem. And a single server can't be close to everyone.

Part 2: The Building Blocks of a System

Now that you understand how machines communicate, let's talk about the components you'll use to build systems. Think of these as Lego pieces — you choose which pieces to use and explain why.

2.1 Servers

Frontend Servers serve your UI — HTML, CSS, JS. In modern architectures, they often serve a static SPA (React/Vue) that makes API calls to the backend.

Backend Servers run business logic: processing payments, validating data, querying databases. Examples: Node.js (Express), Python (FastAPI), Go.

What is NginxNginx (pronounced "engine-X") is a high-performance web server and reverse proxy. As a web server, it serves static files (HTML, CSS, JS, images). As a reverse proxy, it sits in front of your backend servers and forwards incoming requests to them — handling load balancing, SSL termination, and caching so your app servers don't have to.? You'll see Nginx mentioned throughout this guide. It's a web server and reverse proxy — software that sits between the internet and your backend servers. It handles two main jobs: (1) serving static files (your HTML, CSS, JS) directly to users, and (2) forwarding API requests to your backend application servers. Think of it as a receptionist — it greets every visitor and routes them to the right department. Most production deployments use Nginx (or a cloud equivalent like AWS ALB) as the entry point to their system.

Read vs Write Servers (CQRSCommand Query Responsibility Segregation — a pattern where you use separate models (or even separate databases) for reading and writing data.): In high-traffic systems, separate read and write paths. The idea: reads and writes have fundamentally different needs. Reads can be scaled easily with replicas and caching; writes need strict consistency guarantees. By separating them, you can optimize each independently — for example, the read database might store a user's profile with their 10 most recent orders already combined into one row (fast to fetch, no joins needed), while the write database keeps users and orders in separate tables to maintain clean, consistent data. We'll cover this concept fully in Section 3.8: Normalization vs Denormalization.

Serverless Functions (Lambda): Single functions that run on-demand. The cloud provider manages all the infrastructure — you just write the function code.

What is a "cold start"? When a serverless function hasn't been called recently, the cloud provider needs to spin up a new container to run it. This initial setup can add 100ms–2s of latency. It's like a car that's been sitting in the cold — it takes a moment to warm up. After the first call, subsequent calls are fast ("warm starts"). This is why serverless isn't great for latency-sensitive operations.

Use for: event-driven tasks, unpredictable traffic, simple stateless operations. Avoid for: long-running processes, WebSockets, or high-volume steady traffic (a dedicated server is cheaper).

2.2 APIs

Wait — isn't this just HTTP? What's the difference between HTTP and an API? HTTP is the transport protocol — the language that clients and servers use to talk to each other (we covered this in Section 1.5). An API is a contract — a set of rules about what URLs exist, what data you send, and what you get back. Think of it this way: HTTP is like the postal service (it delivers envelopes), and an API is like a form you fill out and mail in (it defines what information goes in the envelope and what response you'll get). REST, GraphQL, and gRPC are different styles of designing that form — but they all use HTTP (or similar protocols) to deliver it.

REST

The most common API style. REST stands for Representational State Transfer. The key idea: everything is a resource (a noun — user, project, order) identified by a URL. You use HTTP methods to perform actions on resources.

GET    /api/users          → List all users
GET    /api/users/123      → Get user 123
POST   /api/users          → Create a new user
PUT    /api/users/123      → Update user 123 (full replace)
PATCH  /api/users/123      → Update user 123 (partial)
DELETE /api/users/123      → Delete user 123

GraphQL

Client specifies exactly what data it wants. Single endpoint. Best for complex data relationships and mobile apps (minimize data transfer). Harder to cache because every query is different.

// GraphQL query — client asks for exactly what it needs
query {
  user(id: 123) {
    name
    email
    projects {
      title
      status
    }
  }
}

// Response — no over-fetching, no under-fetching
{
  "user": {
    "name": "John",
    "email": "john@example.com",
    "projects": [
      { "title": "AI Dashboard", "status": "active" }
    ]
  }
}

With REST, you'd need GET /users/123 then GET /users/123/projects — two round trips. GraphQL does it in one.

REST vs GraphQL: When to Pick Each

Factor	REST Wins	GraphQL Wins
Data shape	Simple, predictable resources	Deeply nested, interconnected data
Client diversity	One client type (e.g., web only)	Many clients needing different data (web, mobile, TV)
Caching	HTTP caching works out of the box (CDN, browser)	Harder — all requests are POST to one endpoint
Bandwidth	Over-fetching is acceptable	Mobile/low-bandwidth needs only specific fields
Team expertise	Most developers know REST already	Requires learning schema, resolvers, and tooling

Default to REST unless you have multiple clients with significantly different data needs. GraphQL shines when the alternative is dozens of custom REST endpoints.

gRPC

A high-performance protocol for server-to-server communication. "Binary protocolInstead of sending human-readable JSON text, gRPC sends compact bytes that are smaller and faster to parse. Like the difference between a handwritten letter and a barcode — less readable to humans, but much faster for machines." means data is encoded in a compact binary format (Protocol Buffers) instead of JSON text — smaller payloads, faster parsing. "Strongly typedBoth the client and server agree on the exact shape of every message via a .proto file. If you send the wrong type, it fails at compile time — not at runtime." means both sides share a .proto definition file that defines every message and service. Use gRPC for internal microservice communication where performance matters.

WebSockets

Persistent, bidirectional connection. Unlike HTTP (client asks, server responds), WebSockets allow the server to push data to the client at any time.

How it works: the connection starts as a normal HTTP request, then "upgrades" to a WebSocket:

Use for real-time features: chat, live dashboards, collaborative editing, notifications, multiplayer games.

How do you handle API versioning? Common approaches: URL path versioning (/v1/users, /v2/users) or a custom header (API-Version: 2). Version when you make breaking changes to the contract. Keep old versions running until clients migrate — never break existing consumers.

2.3 Databases

SQL (Relational)

Tables with rows, columns, and foreign keys. PostgreSQL is the go-to. Choose when: data is relational, you need ACID transactionsAtomicity, Consistency, Isolation, Durability — guarantees that database operations are reliable. See Part 3 for the full breakdown., complex queries, or consistency is critical.

Why not just use one big database for everything? Because different data has different access patterns. Your user profiles need complex queries and consistency (SQL). Your session tokens need blazing-fast lookups (Redis). Your product images need cheap, infinite storage (S3). Using the right tool for each job is a core system design principle.

NoSQL Types

Document Stores (MongoDB): Store data as flexible documentsIn NoSQL, a "document" is a self-contained data record (usually JSON-like) that can have nested fields, arrays, and varying structure — unlike a rigid SQL row. — think JSONJavaScript Object Notation — a lightweight text format for storing data as key-value pairs: {"name": "John", "age": 30}. Human-readable and used everywhere in web APIs. objects with nested fields. Use for semi-structured data, varying schemas, content management.
Key-Value Stores (Redis): Simple key → value lookups. Entirely in-memory, sub-millisecond reads. Use for caching, sessions, rate limiting.
Wide-Column (Cassandra): Massive scale writes with high availability. Use for time-series, IoT data, event logging.
Graph (Neo4j): Data stored as nodes and edges (relationships). Use for social networks, recommendations, fraud detection.

Database Indexes

An index is like a book's index — instead of reading every page (full table scan), you look up the topic and jump to the right page.

Primary index: Automatically created on the primary key. Determines physical storage order of rows on disk (clustered index). Every table has one.
Secondary index: Manually created on frequently queried columns. A separate data structure that maps column values → row locations. Example: indexing email in a users table so WHERE email = 'john@example.com' doesn't scan all rows.

Trade-off: Indexes speed up reads (O(log n) lookup instead of O(n) scan) but slow down writes (every INSERT/UPDATE must also update the index). Don't index every column — index the columns you frequently WHERE, JOIN, or ORDER BY on.

If Redis is so fast, why not use it as the main database? Redis stores everything in RAM, which is expensive and volatile. A server with 256GB RAM costs 10x more than 256GB of SSD. And if Redis crashes without persistence configured, your data is gone. Use Redis as a cache layer in front of your primary database — not as a replacement for it.

Pro tip: Many modern systems use BOTH SQL and NoSQL. SQL for core transactional data, Redis for caching/sessions/real-time features. This is called polyglot persistence — using different databases for different jobs.

2.4 Blob / Object Storage

For large files: images, videos, PDFs. Use AWS S3 / Azure Blob. Upload file → get URL → store URL in database. Cheap, scales to exabytes, integrates with CDNs.

Pre-signed URLs

When a user needs to download a large file, don't proxy it through your server. Instead, generate a pre-signed URL — a temporary, time-limited URL that lets the user download directly from S3. Your server hands them the URL and steps out of the way.

// Generate a URL valid for 15 minutes — user downloads directly from S3
const url = await s3.getSignedUrlPromise('getObject', {
  Bucket: 'my-bucket',
  Key: attachment.s3Key,
  Expires: 900   // seconds
});
// Your server never touches the file bytes on download

This matters enormously at scale: a 2GB video download doesn't consume your server's bandwidth or memory at all. The pattern is: store the S3 key in your database, the actual file in S3. Your database row is tiny; S3 handles the heavy lifting.

2.5 Load Balancers

A load balancer sits in front of your servers and distributes incoming requests so no single server gets overwhelmed.

Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests go to servers in order: A, B, C, A, B, C...	Identical servers, even distribution
Weighted Round Robin	More powerful servers get proportionally more requests	Servers with different capacities
Least Connections	Send to whichever server currently has the fewest active connections	Requests with varying processing times
IP Hash	Hash the client's IP to determine which server handles it — same client always goes to same server	Session stickiness without shared state

Layer 4 vs Layer 7

Layer 4 (Transport)

Sees: IP addresses, ports

Decision: Route by IP/port

Example: All traffic on :443 → Server Pool

Layer 7 (Application)

Sees: URLs, headers, cookies, body

Decision: Route by content

Example: /api/* → Backend servers, /static/* → CDN, /admin/* → Admin servers

Layer 4 is faster (less inspection), but Layer 7 is smarter (can route based on HTTP content). Default to Layer 7 for more control.

What if the load balancer itself goes down? You always run at least two load balancers. One handles traffic while the other stands by as a backup. If the active one fails, the backup takes over automatically — usually within seconds. In practice, cloud providers (AWS, GCP, Azure) manage this for you behind the scenes. When you create a load balancer on AWS, you're never relying on a single machine — they run redundant copies automatically.

2.6 Caching

Caching stores frequently accessed data in a fast layer (RAM) so you don't have to hit the slower layer (database/disk) every time. Analogy: a sticky note of frequently called phone numbers on your desk — faster than looking them up in the phone book every time.

Redis is the industry standard in-memory cache.

Cache-Aside (Lazy Loading) — Step by Step

Invalidation Strategies

Cache-Aside (lazy loading): App checks cache first, loads from DB on miss. Most common pattern.
Write-Through: Every write goes to both the cache and the database simultaneously. Cache is always fresh, but writes are slower (two operations).
Write-Behind (Write-Back): Write to cache immediately, then asynchronously flush to the database later. Fastest writes, but risk of data loss if the cache crashes before flushing.

Key Concepts

TTLTime to Live — a countdown timer on a cache entry. When it hits zero, the entry is automatically deleted. Prevents serving stale data forever. (Time to Live): Auto-expires cache entries after a set duration. Even if your active invalidation fails, TTL ensures data refreshes eventually.

LRULeast Recently Used — an eviction policy where the item that hasn't been accessed for the longest time gets removed first when the cache is full. (Least Recently Used): When the cache is full, evict the item that hasn't been accessed for the longest time. The assumption: recently accessed data is more likely to be accessed again.

What happens when the cache has stale data? This is the fundamental tension of caching. If a user updates their profile, but the old version is still in the cache, other users see outdated info. Solutions: delete/update the cache on writes (cache-aside), use short TTLs for frequently changing data, or use write-through for data where freshness is critical.

What happens when the cache fills up? The cache uses an eviction policy to make room. LRU (Least Recently Used) evicts the oldest-accessed entry. LFU (Least Frequently Used) evicts the least-accessed overall. Most systems use LRU — it's simple and works well for typical access patterns. Important: Redis defaults to no eviction (it returns errors when full), so always configure a maxmemory-policy.

Distributing Cache: Consistent Hashing

A single Redis instance has a memory ceiling. When you need multiple cache nodes, you must decide which keys go where. The naive approach — hash(key) % N — breaks when you add or remove a node (N changes → ~75% of keys remap → massive cache miss storm).

Consistent hashing solves this by placing cache nodes on a virtual ring. Adding a node only remaps ~1/N of keys instead of all of them. This is how Redis Cluster, DynamoDB, and Cassandra distribute data.

Full explanation with diagrams in Section 3.9: Hashing & Consistent Hashing.

2.7 CDN

A CDN (Content Delivery Network) is a global network of servers that caches and serves content from the location closest to the user.

An edge server is a CDN server in a specific geographic location. Here's how it works:

Use for: images, CSS/JS bundles, videos, fonts. Don't use for: user-specific data (dashboards) or real-time data (live stock prices).

2.8 Message Queues

A queue is a FIFOFirst In, First Out — like a line at a coffee shop. The first person in line gets served first. line of messages waiting to be processed. A producer adds messages; a consumer reads and processes them.

Queue vs Stream

Queues (SQS/RabbitMQ): One consumer per message. Message is deleted after processing. Think of a to-do list — once a task is done, you cross it off.

Streams (Kafka): Multiple consumers can read the same messages. Messages are retained and replayable. Think of a newspaper — many people can read the same article, and back issues are kept.

// Producer: add a job to the queue
await queue.send({
  type: 'SEND_WELCOME_EMAIL',
  userId: 123,
  email: 'john@example.com'
});

// Consumer: process jobs from the queue
while (true) {
  const message = await queue.receive();
  await sendEmail(message.email, 'Welcome!');
  await queue.delete(message.id);  // Done, remove from queue
}

The power of queues: they decouple producers from consumers. Your API server says "send this email" and immediately responds to the user. The email actually sends 2 seconds later via a background worker. If the email service is down, the message stays in the queue and retries later.

What happens if a worker crashes mid-processing? The message becomes "invisible" for a timeout period (visibility timeout in SQS, or unacked in RabbitMQ). If the worker doesn't acknowledge completion before the timeout, the message reappears in the queue for another worker to pick up. After several failed attempts, it moves to a dead letter queue (DLQ) for manual inspection.

2.9 Virtual Networks

A VPC (Virtual Private Cloud) is your own isolated network in the cloud. Think of it like a gated office building — you control who gets in and what they can access.

A subnet is a subdivision of your VPC:

Public subnet: Accessible from the internet. Only put your load balancer here.
Private subnet: NOT accessible from the internet. Put your app servers, databases, and Redis here.

Only the load balancer is exposed to the internet (public subnet). App servers, PostgreSQL, and Redis sit in a private subnet — unreachable from outside.

Only the load balancer is exposed to the internet. Attackers can't directly reach your database — they'd have to go through the load balancer, then the app server, which validates authentication and authorization.

What if an app in the private subnet needs to call an external API? Use a NAT (Network Address Translation) gateway. It sits in the public subnet and lets private instances make outbound requests — for example, calling Stripe's API to process a payment — without being directly reachable from the internet. Outbound: yes. Inbound: still blocked.

The LinkPulse Story

Chapter 2: Week Two

LinkPulse has 500 users now. Every page load — the feed, trending links, user profiles — hits PostgreSQL directly. You run htop on the server: CPU is pinned at 80%. The database is doing all the work, and most of it is redundant. The "trending links" query crunches the same numbers for every single visitor, and the results barely change minute to minute.

You add Redis as a cache. The pattern is simple: before querying PostgreSQL, check Redis. If the data is there and fresh, return it. If not, query PostgreSQL, store the result in Redis with a 60-second expiration (TTL — it auto-deletes after 60 seconds so data stays fresh), and return it. This is the cache-aside pattern you learned in Section 2.6. The trending page response drops from 200ms to 3ms. CPU falls to 20%.

Uh Oh

Users start uploading thumbnail images for their links. You store the actual image files directly inside PostgreSQL — it seemed easy at the time. Within days, the database balloons to 12 GB. Queries slow to a crawl because the database is now struggling to sift through huge image files mixed in with your regular data. Every time someone loads a feed page, the server pulls megabytes of image data out of the database and sends it all to the user's browser.

You move all images to Amazon S3 — object storage designed exactly for this. The database now stores only a URL string pointing to S3. Then you put Cloudflare CDN in front of S3. The CDN caches copies of your images on servers around the world (those "edge servers" from Section 2.7). Your user in Mumbai? Their thumbnail now loads in 5ms from an edge server in Mumbai, not 200ms from Virginia. The database shrinks back to 400 MB.

Analogy

Your single server was a restaurant where the chef also waits tables, washes dishes, and parks cars. Adding S3, Redis, and CDN is like hiring specialists — a waiter, a dishwasher, a valet. The chef can finally focus on cooking. Each component does one thing well.

Two weeks later, a tech blogger shares LinkPulse. Traffic spikes 10× overnight. Your single Node.js process can't keep up — requests pile up and start timing out. Users see 502 errors. You need more servers, fast.

You spin up 3 Node.js instances behind an Application Load Balancer (ALB). The ALB uses round-robin to distribute incoming requests across the three servers. This works because — remember — HTTP is stateless (Section 1.6): no server "remembers" your previous request. And since you store login sessions in Redis (not on the server itself), any of the three servers can check Redis and know you're logged in. The same user can hit server 1, then server 3, and both know they're logged in.

You also notice the signup flow is slow: after creating an account, the API sends a welcome email right then and there, during the request — meaning the user has to sit and wait for the email to actually send before they see the signup confirmation. If the email provider is slow (or down), the user waits. You add a message queue (Amazon SQS) — the API drops a "send welcome email" message onto the queue and returns instantly. A background worker picks up the message and sends the email on its own time. The user never waits for the email provider.

Uh Oh

You now have three app servers, but still one PostgreSQL database. You can keep adding more app servers to handle more traffic — that part scales. But the database doesn't. Database CPU climbs to 95% during peak hours. You distributed the code, but the data is still on one machine. Every single read and write from all three servers funnels into that single database.

Key Takeaway

Every building block from Part 2 — caching, CDN, object storage, load balancers, message queues — solved a specific bottleneck. But notice the pattern: you pushed the problem somewhere else each time. The hardest question is still ahead: what do you do when a single database can't keep up?

Part 3: Core Patterns & Principles

These are foundational theoretical concepts that underpin every distributed system. Understanding them deeply will make every design decision clearer.

3.1 ACID Transactions

ACID is a set of four guarantees that relational databases provide for transactions:

Atomicity: All operations succeed or none do. Example: transferring $100 from Account A to Account B involves two operations (debit A, credit B). If the system crashes after debiting A but before crediting B, atomicity ensures the debit is rolled back — no money disappears.
Consistency: The database always moves from one valid state to another. All constraints (unique keys, foreign keys, check constraints) are enforced. Example: if you have a constraint that account balances can't go negative, any transaction that would violate this is rejected.
Isolation: Concurrent transactions execute as if they were sequential. Example: if two users try to book the last seat on a flight simultaneously, isolation prevents them from both succeeding — one will see "seat taken."
Durability: Once a transaction is committed, it survives crashes, power outages, and hardware failures. The data is written to disk, not just held in memory. Example: after you see "Payment confirmed," that data persists even if the server crashes 1 second later.

But what level of isolation? Doesn't that matter? Yes — databases offer multiple isolation levels. READ COMMITTED (PostgreSQL's default) prevents dirty reads but allows non-repeatable reads. SERIALIZABLE prevents all anomalies but is the slowest. Most applications use READ COMMITTED and handle edge cases in application logic or with explicit row-level locking.

3.2 CAP Theorem

A distributed system can provide at most TWO of: Consistency (every read returns latest write), Availability (every request gets a response), Partition Tolerance (system works despite network splits). Since network partitions are inevitable in any distributed system, your real choice is CP or AP.

CP example (banking): During a network partition between two data centers, a banking system rejects transactions rather than risk processing them with stale data. Users see "Service temporarily unavailable" — annoying, but no one loses money.

AP example (social feed): During a partition, a social media feed stays up and serves content, even if some posts are a few seconds stale. Users would rather see a slightly outdated feed than an error page.

Important nuance: Most real systems don't pick CP or AP for the entire system. They mix per operation. A banking app might be CP for transfers but AP for displaying account history. Think about consistency requirements per feature, not per system.

Can a system be both consistent AND available? Yes — when there's no partition. CAP only forces a trade-off during a network split (also called a "partition") — that's when two parts of your system literally can't talk to each other. Imagine you have two data centers (East and West), and the network connection between them goes down. Each side is still running, but they can't sync data. Now you have to choose: do you reject requests until the connection is restored (choosing Consistency), or keep serving requests knowing the two sides might have different data (choosing Availability)? When the network is healthy and everything can communicate normally, you can have all three. That's why CAP is about failure modes, not normal operation.

3.3 Consistency Models

When data is stored across multiple replicasCopies of the same data store — whether that's a relational database (PostgreSQL), a NoSQL database (MongoDB, DynamoDB), or a cache (Redis). Data written to the primary is copied to replicas. Used for read scaling and fault tolerance. (copies of the database or data store), the question is: how up-to-date are reads?

Strong consistency: Every read returns the latest write, no matter which replica you ask. Like a shared Google Doc — every viewer sees the exact same version at all times.
Eventual consistency: Replicas will convergeEventually reach the same state. After a write, different replicas may temporarily return different values, but given enough time with no new writes, they'll all agree. to the same value eventually, but may be briefly stale. Like updating your social media profile — your friend might see the old photo for a few seconds before it refreshes.
Read-Your-Writes: You always see your own writes immediately; others may see stale data. The best of both worlds for user experience.

If eventual consistency means stale data, how is that acceptable? Because many use cases don't need instant consistency. A "likes" counter showing 1,002 instead of 1,003 for 2 seconds is fine. A product listing showing old stock for 5 seconds is acceptable. The trade-off is that eventual consistency lets you scale reads massively across replicas without coordinating between them on every read.

3.4–3.5 Latency & Throughput

Latency = how long one operation takes. Throughput = how many operations per second.

Key Latency Numbers (approximate)

Operation	Time	Relative
L1 cache reference	0.5 ns	—
RAM access	100 ns	200× L1
SSD random read	150 µs	1,500× RAM
HDD seek	10 ms	67× SSD
Same datacenter round trip	0.5 ms	—
Cross-continent round trip	150 ms	300× datacenter

Key takeaway: RAM is ~1000× faster than SSD, SSD is ~100× faster than network. This is why caching is so powerful — moving a lookup from disk to memory can drop response time from 50ms to 0.1ms.

Throughput example: A single server handling 1,000 requests/second with 50ms latency per request has high throughput and moderate latency. A batch job processing 1 million records but taking 5 seconds per record has high throughput but terrible latency. They're related but independent.

3.6 Fault Tolerance

The art of keeping a system running even when things break. Things WILL break — hard drives die, code has bugs, networks go down. The question isn't if something will fail, it's how your system responds when it does. Failures fall into three categories:

Types of Failures

1. Hardware Failures — physical components break.

Disk dies: The hard drive that stores your data physically stops working. Mitigation: RAID (Redundant Array of Independent Disks) — instead of storing data on one disk, you spread it across multiple disks so that if one fails, the data can be rebuilt from the others. Think of it like writing the same notes in two separate notebooks — lose one, you still have a copy.
Memory corrupts: The server's RAM (the fast, temporary storage it uses while running) has a glitch and flips a bit of data. Mitigation: ECC memory (Error-Correcting Code) — special RAM that can detect and automatically fix single-bit errors. Regular RAM doesn't catch these errors — your server would just silently use wrong data. ECC catches it and fixes it on the fly.
Power supply fails: The physical power unit in the server dies. Mitigation: multiple power supplies — server-grade machines come with two, each plugged into a different power source.
Entire server dies: Mitigation: multi-AZ deployment. "AZ" stands for Availability Zone — cloud providers like AWS split each region (e.g., US-East) into multiple physically separate data centers (AZs). If you run copies of your server in two AZs, a power outage or fire in one data center doesn't take you offline.

2. Software Failures — your code (or the code you depend on) misbehaves.

Deadlock: Two parts of your program are each waiting for the other to finish, so neither ever does. Imagine two people in a hallway — each waiting for the other to step aside, so nobody moves. In code, this usually happens when two processes each hold a resource (like a database row) and need the one the other is holding. The app freezes.
Memory leak: Your program gradually uses more and more RAM without releasing it. Like filling a bathtub with the drain closed — eventually it overflows. After hours or days, the server runs out of memory and crashes. Common in long-running servers that create objects but never clean them up.
Out-of-memory crash: The program asks for more RAM than the server has, and the operating system kills it.

Mitigations for software failures:

Health checks: The load balancer periodically asks each server "are you alive?" (usually by hitting an endpoint like GET /health). If a server stops responding, the load balancer takes it out of the pool.
Auto-restart: Tools called process managers watch your app and restart it if it crashes. PM2 is a popular one for Node.js apps; systemd is built into Linux itself. They're like a babysitter that picks the kid back up when they fall.
Circuit breaker: If your app calls an external service (say, a payment API) and that service is failing, a circuit breaker stops your app from repeatedly trying. It "opens the circuit" (like a fuse in your house) and returns a fallback response immediately, instead of making your users wait for a request that's going to fail anyway. After a cooldown period, it tries one request to see if the service recovered.
Graceful degradation: When part of your system fails, you keep the rest working. Example: if your recommendation engine goes down, you still show the main feed — just without personalized suggestions. The user gets a slightly worse experience instead of a completely broken page.

3. Network Failures — communication between parts of your system breaks.

Packets dropped: Data is sent over the network in small chunks called "packets." Sometimes they just get lost in transit — like letters lost in the mail.
DNS resolution fails: DNS is the "phone book" that converts domain names (like api.stripe.com) into IP addresses. If that lookup fails, your server literally doesn't know where to send the request.
Network partition: Two parts of your system can't talk to each other at all (the "network split" from Section 3.2 CAP).

Mitigations for network failures:

Retries with exponential backoff: If a request fails, try again — but wait a little longer each time. First retry after 1 second, then 2 seconds, then 4, then 8. This prevents all your servers from hammering a struggling service at the same time (which would make the problem worse).
Timeouts: Never wait forever for a response. Set a deadline (e.g., 3 seconds). If the other service doesn't respond by then, give up and either retry or return an error. Without timeouts, one slow service can freeze your entire application.
Fallback responses: If you can't reach a service, return something reasonable instead of an error. Can't reach the recommendations service? Show trending posts instead.

Redundancy Strategies

A well-designed system handles all three failure types simultaneously. The core idea: have backup copies of everything. And "everything" means both your app servers (the machines running your code) and your databases (the machines storing your data). In a real production system, you'll typically have multiple app servers behind a load balancer, AND multiple copies of your database — so if any single machine dies, there's another one ready to go.

These strategies apply to any component — web servers, databases, caches, queues — anywhere a single machine failing would be a problem:

Active-Passive (failover): You run two copies of a component — say, two database servers. One (the "active") handles all the work. The other (the "passive") sits idle, continuously receiving a copy of all the data, ready to go at a moment's notice. If the active one crashes, the passive takes over its role. Analogy: an understudy in a theater production — always rehearsed, but only goes on stage if the lead can't. This is the most common pattern for databases, where you want one source of truth with a hot backup.
Active-Active: You run two copies and both handle real traffic at the same time. If one dies, the other absorbs all traffic instantly — no switchover delay. Analogy: two cashiers at a store — if one takes a break, the other keeps the line moving. This is the most common pattern for web/app servers (and load balancers), where any copy can handle any request.
N+1 redundancy: Run one more instance than you need. If you need 3 app servers to handle peak traffic, run 4. If you need 1 database primary, have 1 standby. That way losing any single machine doesn't put you over capacity or take you offline.

What Actually Happens During a Database Failover

This example uses a relational database (PostgreSQL), but the same failover pattern applies to Redis clusters, MongoDB replica sets, Cassandra nodes — any storage system where you can't afford downtime.

Every production database should have a synchronous standby — a second copy of the database that receives every write at the same time as the primary. "Synchronous" means the primary waits for the standby to confirm "I got it" before telling your app "write successful." This way, the standby always has an exact copy of the data.

When the primary crashes, here's the actual timeline:

T+0s:   Primary DB crashes (hardware failure or network issue)
T+10s:  Cloud provider detects primary is unhealthy (health checks fail)
T+10s:  Failover decision made — promote the standby to become the new primary
T+30s:  Standby promoted. DNS record updated to point to the new server's IP
T+60s:  App servers reconnect to new primary. Writes resume.

Synchronous standby = zero data loss — every write that was confirmed exists on both machines. An asynchronous standby (where the primary doesn't wait for confirmation) is cheaper and faster, but if the primary crashes, the last few seconds of writes may be lost because they hadn't been copied yet.

With proper retry logic — where your app automatically retries failed database connections, waiting a bit longer each time (exponential backoff: 1s, 2s, 4s, 8s...) — most users just see a brief "saving..." spinner, then their operation succeeds. No human has to wake up and manually fix anything.

Chaos Engineering

Chaos engineering means deliberately breaking things in production to make sure your backups actually work. Netflix pioneered this with a tool called Chaos Monkey, which randomly kills production servers throughout the day. If your system survives random server deaths without users noticing, your redundancy is real — not just theoretical. The key insight: if you only test failover during an actual outage, you will fail spectacularly at 2am.

What about split-brain — when two servers both think they're the primary? This is the most dangerous failure mode in databases. If both accept writes, you end up with two different versions of your data that can never be automatically merged. Prevention: use a quorum (majority vote). Before a standby can be promoted to primary, a majority of nodes in the cluster have to agree: "yes, the old primary is really dead, and yes, you should take over." This is why database clusters typically have odd numbers of nodes (3, 5, 7) — with an even number, you can get a tie vote and nobody gets promoted.

3.7 Sync vs Async

Sync: caller waits. Async: fire and forget, handle later. Use async for emails, image processing, analytics. Use sync for critical operations where the user must know the result.

The Event Bus Pattern

A powerful async pattern: when a service completes an operation, it publishes an event to a message queue (Kafka, SQS, etc.). Multiple independent consumers react to that event asynchronously — the publisher doesn't know or care who's listening.

// Write service: update the database, then publish an event
await db.query('UPDATE documents SET content=$1 WHERE id=$2', [content, docId]);
await eventBus.publish('document.updated', { docId, tenantId, userId });

// --- Async consumers react independently ---

// Search indexer updates Elasticsearch
eventBus.consume('document.updated', async ({ docId }) => {
  await searchIndex.reindex(docId);
});

// Cache invalidator clears stale Redis entry
eventBus.consume('document.updated', async ({ docId, tenantId }) => {
  await redis.del(`doc:${tenantId}:${docId}`);
});

// Audit logger writes to audit trail
eventBus.consume('document.updated', async (event) => {
  await auditLog.write(event);
});

The write service finishes fast. Everything downstream — search indexing, cache invalidation, audit logging — happens asynchronously without blocking the user's response. This is a natural companion to CQRS (Command Query Responsibility Segregation): the write side publishes events, the read side consumes them to update its optimized read models.

What if an async consumer fails to process a message? Use retries with exponential backoff: retry after 1s, then 2s, then 4s, then 8s. After N failures, route the message to a dead letter queue (DLQ) for manual inspection. This prevents a single "poison message" from blocking the entire queue.

3.8 Normalization vs Denormalization

Normalization means organizing data to eliminate redundancy. Each piece of information lives in exactly one place. You connect related data using foreign keysA column in one table that references the primary key of another table. It creates a link between the two tables. Example: orders.user_id references users.id. and retrieve it with JOINsA SQL operation that combines rows from two or more tables based on a related column. Example: SELECT * FROM orders JOIN users ON orders.user_id = users.id.

-- Normalized: user name stored once
SELECT orders.id, users.name, orders.total
FROM orders
JOIN users ON orders.user_id = users.id
WHERE orders.id = 456;

-- Denormalized: user name copied into orders table
SELECT id, user_name, total
FROM orders
WHERE id = 456;   -- No JOIN needed, but user_name might be stale

Normalized: Less storage, no update anomalies (change a user's name in one place), but more JOINs (slower reads). Denormalized: Faster reads (no JOINs), but risk of inconsistency (user changes name → must update everywhere). Normalize core data; denormalize for read-heavy dashboards.

3.9 Hashing & Consistent Hashing

A hash function takes an input (any size) and produces a fixed-size output. The same input always produces the same output, but even a tiny change in input produces a completely different output. Hash functions are used everywhere: data integrity, caching, load distribution, and security.

The Modulo Problem

The simplest way to distribute data across N servers: hash(key) % N. Key "user_42" hashes to 78 → 78 % 4 = server 2. This works until you add or remove a server:

With 4 servers: hash("user_42") % 4 = 2  → Server 2
With 5 servers: hash("user_42") % 5 = 3  → Server 3  (MOVED!)

Result: ~75% of ALL keys remap to different servers
       → massive cache misses → thundering herd on the database

Consistent Hashing: The Virtual Ring

Consistent hashing solves the modulo problem by imagining all possible hash values arranged in a circle (a "ring") instead of a line. Both servers and keys get placed on this ring based on their hash value. Here's how it works, step by step:

Step 1: Place the servers on the ring. Hash each server's name to get its position on the ring. Each server "owns" the colored section of ring behind it (going counterclockwise back to the previous server). Any key that lands in a server's colored zone belongs to that server.

Step 2: Place a key on the ring. Hash the key (e.g., "user_42") to get its position. It lands in the gold zone — so it belongs to Server A.

Step 3 (the magic): Add a new server. Server D joins the ring between C and A. It takes over a slice of what was A's zone (shown in pink). Only the keys that land in that new pink slice need to move — everything else stays exactly where it was.

BEFORE (3 servers)

AFTER (Server D added)

That's the whole idea: Server D took over a slice of the ring that used to belong to A. Only the keys in that slice moved — everything else stayed put. Instead of ~75% of keys remapping (like with modulo), only about 1/N of all keys need to move. The more servers you have, the less disruption each change causes.

Virtual Nodes

With only 3 physical servers on the ring, distribution can be uneven. Solution: each physical server gets multiple "virtual nodes" spread around the ring (e.g., Server A → A-1, A-2, A-3, A-4). This ensures even key distribution even with few physical servers.

Where it's used: Amazon DynamoDB, Apache Cassandra, distributed caches (Redis Cluster), CDN edge routing. Any time you need to distribute data and handle server additions/removals gracefully.

3.10 Scaling

Vertical: bigger machine (more CPU, RAM). Simple, no code changes, but has a physical ceiling. Horizontal: more machines behind a load balancer. Complex but virtually unlimited.

Real-World Server Tiers: What You're Actually Paying For

Abstract talk about "bigger machines" is hard to reason about. Here's what actual cloud servers look like, what they cost, and roughly what they can handle. These are the tiers you'll pick from when deploying a real app on AWS or Azure.

App / Web Servers (AWS EC2 & Azure equivalents)

Tier	AWS Instance	Azure VM	vCPUs	RAM	~Monthly Cost	Handles (typical web app)
Hobby / Dev	t3.micro	B1s	2	1 GB	$8–12	~50–200 concurrent users. Fine for a personal project or dev/staging environment.
Small App	t3.small	B1ms	2	2 GB	$15–22	~200–1,000 concurrent users. An early-stage SaaS or internal tool.
Startup	t3.medium	B2s	2	4 GB	$30–42	~1,000–3,000 concurrent users. A real product with paying customers.
Growth	m5.large	D2s v3	2	8 GB	$70–93	~3,000–10,000 concurrent users per instance. Usually you're running 2+ of these behind a load balancer by now.
Production	m5.xlarge	D4s v3	4	16 GB	$140–185	~10,000–30,000 concurrent users per instance. Mid-size SaaS, e-commerce.
Heavy	m5.2xlarge	D8s v3	8	32 GB	$280–370	~30,000–80,000 concurrent users per instance. High-traffic platforms.
Compute-Heavy	c5.4xlarge	F16s v2	16	32 GB	$490–620	CPU-intensive workloads (image processing, real-time analytics). Not more users — faster processing per request.

Managed Databases (AWS RDS PostgreSQL & Azure Database for PostgreSQL)

Tier	AWS RDS	Azure DB	vCPUs	RAM	~Monthly Cost	Handles (simple queries)
Dev / Hobby	db.t3.micro	B1ms	2	1 GB	$15–25	~100–500 queries/sec. Dev, prototyping, tiny apps.
Small	db.t3.small	GP B2s	2	2 GB	$25–45	~500–1,500 queries/sec. Small SaaS with a few hundred active users.
Mid	db.m5.large	GP D2s v3	2	8 GB	$130–175	~2,000–5,000 queries/sec. This is where most startups live — enough for thousands of active users.
Production	db.r5.xlarge	MO E4s v3	4	32 GB	$350–460	~5,000–15,000 queries/sec. Large working set fits in RAM, fast reads.
Heavy	db.r5.2xlarge	MO E8s v3	8	64 GB	$700–920	~15,000–40,000 queries/sec. At this point you should also be looking at read replicas and caching.

Important caveats: These "handles" numbers are ballpark estimates for a typical CRUD web app (REST API, PostgreSQL, moderate query complexity). Your actual numbers depend heavily on what each request does — a simple JSON API returning cached data handles far more users than one running complex joins on every request. Costs shown are on-demand pricing in US regions (2025); reserved instances are 30–60% cheaper. The point isn't to memorize these — it's to have a rough mental model of what "$100/month" or "4 vCPUs" actually means in practice.

Practical estimation tip: When a system design scenario says "we have 10 million users," the first question should be: how many are concurrent? A product with 10M registered users might have 50,000 concurrent at peak. Looking at the tables above, that's 2–3 m5.xlarge app servers behind a load balancer, plus a db.r5.xlarge database — totally manageable. This kind of back-of-the-envelope reasoning is what separates good system designers from those who jump to over-engineered solutions.

Scaling Phases

Phase 1 — Squeeze the single server: Vertical scaling (bigger machine), add database indexes, optimize queries (EXPLAIN ANALYZE), connection pooling (PgBouncer). This buys time with minimal complexity.

Phase 2 — Distribute reads: Add read replicas, Redis cache for hot data, CDN for static assets, load balancer in front of multiple app servers. Handles 10-100× the original traffic.

Phase 3 — Partition data: Shard the database, introduce Kafka for event-driven processing, split into microservices for independent scaling, multi-region deployment. This is where complexity explodes — only do it when Phase 2 can't keep up.

Key insight: each phase adds complexity. Don't jump to Phase 3 when Phase 1 fixes haven't been tried. Walk through this progression to understand the trade-offs at each stage.

The Full Scaling Ladder

A more granular view of how a real system evolves from a single server to a global platform:

Monolith + single database — one server, one DB. Build fast, learn fast.
Polyglot persistence — right storage per data type (SQL for relational, document DB for flexible content, S3 for files).
Load balancer + horizontal app scaling — multiple app servers, zero-downtime deploys, shared sessions in Redis.
Read replicas + Redis cache + CDN — read load off primary, hot data in under 1ms, static assets served at the edge.
CQRS — separate read and write APIs with independent scaling, connected via an event bus.
Enterprise infrastructure — SSO (SAML/OIDC), tenant isolation, dedicated databases, SOC 2 compliance.
Multi-region — GeoDNS routes users to nearest region, regional read replicas, regional caches.
Write sharding + regional primaries + HA standby — writes sharded by tenant, synchronous standby per shard for instant failover.
CRDTs/OT + circuit breakers + chaos engineering — conflict-free collaborative editing, graceful degradation, regular failure drills.

The best engineers are not the ones who build Stage 9 for a Stage 1 problem. They build precisely what's needed and see one stage ahead.

Replication

These patterns aren't just for PostgreSQL. When we talk about "primary and replica," "sharding," and "failover" below, these concepts apply to almost every type of storage — relational databases (PostgreSQL, MySQL), NoSQL databases (MongoDB, Cassandra, DynamoDB), in-memory stores (Redis), and even blob/object storage (S3 replicates your files across multiple data centers automatically). The examples use PostgreSQL because it's the most common starting point, but the underlying ideas — "make copies for redundancy," "split data across machines when one can't hold it all," "promote a backup when the primary dies" — are universal. When you see "database" in this section, mentally read it as "any storage system."

Create copies of your database to distribute read traffic:

Writes go to the primary only. Reads are distributed across replicas. If the primary fails, a replica can be promoted.

Sharding

Split data across multiple databases (or storage nodes) so no single machine holds everything. Sharding is used in relational databases (PostgreSQL, MySQL), NoSQL databases (MongoDB shards collections, Cassandra distributes partitions, DynamoDB splits by partition key), and even caches (Redis Cluster shards keys across nodes). The strategies are the same regardless of storage type:

Range-based: Users A-M → Shard 1, N-Z → Shard 2. Simple but can create hot spots (if more users have names starting with A-M).
Hash-based: hash(user_id) % num_shards → even distribution. But range queries (e.g., "all users created this week") must hit every shard.
Geographic: US users → US shard, EU users → EU shard. Low latency for local users, but cross-region queries are expensive.

Hot Spots

A hot spot is when one shard receives disproportionately more traffic than others, defeating the purpose of sharding. Three common causes:

Sequential IDs: Range-based sharding where all new users (IDs 1M-2M) hit the same shard.
Celebrity data: One user account (e.g., a celebrity with millions of followers) gets millions of reads on a single shard.
Temporal patterns: Date-based sharding where today's shard handles all current traffic while historical shards sit idle.

Mitigations: Use hash-based sharding instead of range-based, split hot shards into sub-shards, cache hot keys aggressively in Redis, or add a random salt to the shard key to spread celebrity data across multiple shards.

Fan-outWhen one operation triggers many downstream operations. Example: one user posts → the system writes to 10,000 follower feeds. The "fan" shape of one input spreading to many outputs. is a related scaling concern: when one action triggers many downstream operations (like writing to thousands of follower feeds). Fan-out problems are what make systems like Twitter's news feed challenging at scale.

Fan-out-on-Write vs Fan-out-on-Read

Approach	How It Works	Pro	Con
Fan-out-on-write (push)	When a user posts, immediately write to every follower's feed	Reads are instant — feed is pre-computed	Writes are expensive — 1M followers = 1M writes
Fan-out-on-read (pull)	When a user opens their feed, query all accounts they follow on the fly	Writes are cheap — just store the post	Reads are slow — must query and merge many accounts

Most production systems use a hybrid: push for normal users (fast reads), pull for celebrities (avoid million-write storms).

The LinkPulse Story

Chapter 3: Month Three

LinkPulse has 50,000 users. The single PostgreSQL database that was buckling under three app servers is now your biggest problem. You start where everyone starts: vertical scaling. You upgrade from 4 cores and 16 GB of RAM to 64 cores and 256 GB. Queries speed up. CPU drops. You buy yourself about three months of breathing room.

But you know vertical scaling has a ceiling, so you add read replicas. You set up two PostgreSQL replicas that stream changes from the primary. Since roughly 90% of your queries are reads — loading feeds, browsing profiles, viewing links — you route those to replicas. The primary handles only writes. The load drops dramatically.

Uh Oh

Sam updates her profile photo, refreshes the page, and sees her old photo. She clicks save again — now she has duplicate writes. The problem: replication lag. Her write went to the primary, but her read was routed to a replica that hasn't received the update yet. The data is consistent eventually, but "eventually" doesn't feel great when you're staring at your own stale profile picture.

You implement read-your-writes consistency: for 2 seconds after any write, that user's reads are routed to the primary instead of a replica. It's a targeted fix — everyone else still reads from replicas, but the person who just made a change always sees their own update. Lag still exists, but users never notice it on their own data.

Then you hit an ACID moment. A user saves a new link and shares it simultaneously. Sharing creates a record that references the link by its ID. But what if the share record gets written and the link write fails? The share now points to a link that doesn't exist — a broken reference. You wrap both operations in a transaction. Either both succeed or neither does. Atomicity saves you from corrupted data.

Uh Oh

Month four, 200,000 users. Read replicas handled the read load, but now write throughput is maxed out. The primary server is handling every write for every user. You need to shard — split the data across multiple databases. You shard by user_id: users 1–100K go to shard A, 100K–200K to shard B. Writes are distributed. But then product asks for a "trending links across all users" page. That query now spans every shard. What was a single SQL query becomes a cross-shard nightmare that's slow and complex.

You experiment with range-based sharding first. It works — until you launch in Japan. Japanese usernames cluster in a specific alphabetical range, and that shard becomes a hot spot handling 60% of all writes while others sit idle. You switch to hash-based sharding — hashing user_id to evenly distribute data. The hot spot disappears, but now range queries like "find all users who signed up this week" require querying every shard.

Analogy

Sharding is like splitting a library into multiple buildings. Each building is smaller and faster to search. But "find every book published in 2024" now means visiting every building, searching each one, and merging the results. You traded query simplicity for write scalability.

You also add a "follow" feature — users can follow curators and see their links in a personalized feed. This introduces the fan-out problem. When a curator posts a link, do you immediately write it to every follower's feed (fan-out-on-write), or do you wait until a follower opens their feed and assemble it on the fly (fan-out-on-read)? You go hybrid: push for normal users (most have under 100 followers), pull for popular curators (one has 80,000 followers — pushing to 80K feeds on every post would be brutal).

Key Takeaway

Every scaling decision has a cost. Replicas introduce lag. Sharding breaks cross-cutting queries. Fan-out-on-write is fast to read but expensive to write. The art isn't avoiding trade-offs — it's choosing which trade-off you can live with for each feature.

Part 4: Security

Security is critical for enterprise systems. Understanding these fundamentals is essential for building production-grade applications.

4.1 Session-Based Authentication

A session is a server-side record that tracks a logged-in user. A cookie is a small piece of data the browser automatically sends with every request to that domain.

Pros: Easy to revoke (delete the session from Redis). Cons: Stateful — requires a shared session store (Redis) that all servers can access.

Why Sessions Struggle on Mobile

Session-based auth relies on browser cookies, which creates problems for non-browser clients:

Cookies are browser-specific: Mobile apps (iOS/Android) don't have a browser cookie jar — you'd need to manually manage cookie storage and attachment.
Cookies are domain-bound: If your API is on api.example.com but your web app is on app.example.com, cookies don't transfer across domains without extra CORS configuration.
Cross-platform inconsistency: Cookies behave differently across browsers, WebViews, and native HTTP clients.

This is why APIs that serve both web and mobile clients almost always use token-based auth (JWT) sent in the Authorization header — it works identically on every platform.

4.2 JWT Authentication

A JWT (JSON Web Token) is a self-contained token with three parts separated by dots:

eyJhbGciOiJIUzI1NiJ9.eyJ1c2VySWQiOjQyLCJyb2xlIjoiYWRtaW4iLCJleHAiOjE3MTA3MjAwMDB9.SflKxwRJSMeKKF2QT4fwpM

Part 1 — Header (algorithm):   {"alg": "HS256"}
Part 2 — Payload (claims):     {"userId": 42, "role": "admin", "exp": 1710720000}
Part 3 — Signature:            HMAC-SHA256(header + payload, SECRET_KEY)

The server signs the token with a secret key. On every request, the server verifies the signature — if someone tampers with the payload, the signature won't match. No database lookup needed.

Base64 is encoding, NOT encryption. Anyone can decode a JWT payload — just paste it into base64decode.org. The signature prevents tampering (changing the payload), but not reading.

Safe to Include	NEVER Include
User ID, role, email	Passwords, password hashes
Expiration time (exp)	Credit card numbers
Tenant ID, permissions	SSN, government IDs
Token ID (jti) for revocation	API keys, secret keys

If you need to hide JWT contents, use JWE (JSON Web Encryption) — but for most apps, just don't put secrets in the payload.

If JWTs are stateless, how do you revoke them? You can't — natively. Solutions: maintain a token blocklist in Redis (checking on every request), use short-lived access tokens (15 min) paired with refresh tokens, or rotate the signing key (nuclear option — invalidates ALL tokens).

What happens when a JWT expires? Does the user get logged out? Not if you use refresh tokens. The pattern: issue a short-lived access token (15 min) and a long-lived refresh token (7 days, stored securely). When the access token expires, the client silently sends the refresh token to get a new access token — the user never notices. If the refresh token is also expired, then the user must log in again.

4.2b OAuth 2.0

OAuth lets users log in via a third party (Google, GitHub) without sharing their password with your app.

API Keys are simpler: a static string sent in request headers. Used for server-to-server communication or public API access. Unlike passwords, API keys identify the application, not the user. They don't expire automatically — you must rotate them manually.

4.3 Authorization

RBAC (Role-Based Access Control): Users are assigned roles, roles have permissions.

// RBAC middleware example (pseudocode)
function authorize(requiredRole) {
  return (req, res, next) => {
    const userRole = req.user.role;  // from JWT or session
    const permissions = ROLES[userRole];  // { admin: ['read','write','delete'], viewer: ['read'] }
    if (permissions.includes(requiredAction)) {
      next();  // allowed
    } else {
      res.status(403).json({ error: 'Forbidden' });
    }
  };
}

app.delete('/api/projects/:id', authorize('admin'), deleteProject);

ABAC (Attribute-Based Access Control): Permissions based on multiple attributes — not just role, but also resource ownership, time of day, location, etc. Example: "Editors can modify documents they created, but only during business hours, and only if the document is in 'draft' status." ABAC is more flexible than RBAC but harder to reason about.

Multi-Tenant Data Isolation

In a multi-tenant SaaS (e.g., Slack, Notion), Company A must never see Company B's data. The standard approach: include tenant_id in every table and filter every query with WHERE tenant_id = ?.

The danger: A single query that forgets the tenant_id filter leaks data across tenants — a catastrophic security bug. Defense-in-depth:

PostgreSQL Row-Level Security (RLS): The database itself enforces tenant isolation. You define a policy like CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')). Even if application code forgets the filter, the database adds the WHERE clause automatically.
Middleware injection: Extract tenant_id from the JWT and automatically inject it into every database query via ORM middleware.
Automated tests: Integration tests that verify cross-tenant queries return zero results.

4.4 Encryption

Encryption transforms readable data (plaintext) into unreadable gibberish (ciphertext) that can only be decoded with the correct key.

In transit: TLSTransport Layer Security — encrypts data as it travels between client and server. See Section 1.7 for the handshake details./HTTPS encrypts data while it's moving between machines.
At rest: AES-256 encryption for database storage, S3 Server-Side Encryption for files. Data is encrypted on disk — even if someone steals the hard drive, they can't read it.
Passwords: Never store plain text. Use bcrypt or argon2 — these are intentionally slow hashing algorithms.

Why hash passwords instead of encrypting them? Encryption is reversible — if an attacker gets the key, they can decrypt every password. Hashing is a one-way function: you can hash a password to check if it matches, but you can't reverse a hash back to the original password. Bcrypt adds a "salt" (random data) so even two users with the same password get different hashes.

If I encrypt data at rest, where do I store the encryption key? Never alongside the encrypted data. Use a dedicated key management service: AWS KMS, HashiCorp Vault, or Azure Key Vault. These handle key rotation, access control, and audit logging. The principle: your database stores the encrypted data, a completely separate system holds the keys.

Why SHA-256 Is Bad for Passwords

SHA-256 is a general-purpose hash — it's designed to be fast. That's great for file integrity checks, but terrible for passwords:

SHA-256 (general-purpose, fast)

Modern GPU: ~10 billion hashes/second

8-character password: cracked in hours

Bcrypt (password-specific, intentionally slow)

Modern GPU: ~30,000 hashes/second

8-character password: would take centuries

The difference: ~300,000× slower per hash

Bcrypt's "cost factor" is adjustable — as hardware gets faster, you increase the cost to keep brute-forcing impractical. This is why the algorithm choice matters more than password complexity rules.

4.5 SQL Injection

An attacker inserts malicious SQL into input fields:

// VULNERABLE — string concatenation
const query = "SELECT * FROM users WHERE email = '" + userInput + "'";
// Attacker enters: ' OR 1=1 --
// Resulting query: SELECT * FROM users WHERE email = '' OR 1=1 --'
// This returns ALL users!

// SAFE — parameterized query
const query = "SELECT * FROM users WHERE email = $1";
db.query(query, [userInput]);  // Database treats userInput as data, never as SQL

Does NoSQL injection exist too? Yes. MongoDB queries accept objects — if user input flows directly into a query object, an attacker can inject operators like $gt, $ne, or $regex to bypass filters. Prevention is the same principle: never build queries from raw user input. Use your driver's built-in sanitization and validate input types.

4.6 XSS and CSRF

XSS (Cross-Site Scripting): An attacker injects malicious JavaScript into your page (e.g., via a comment field). When other users view the page, the script runs in their browser and can steal their cookies/tokens. Prevention: sanitize all user-generated content, use Content Security Policy headers, encode output.

CSRF (Cross-Site Request Forgery): An attacker tricks a user's browser into making an authenticated request to your site. Example: a hidden form on a malicious page that transfers money from your bank. Prevention: include a unique, unpredictable CSRF token in every form — the attacker's malicious page can't include the correct token.

4.7 DDoS, Rate Limiting & Zero Trust

DDoS (Distributed Denial-of-Service): An attack where thousands of machines flood your system with traffic to overwhelm it. Mitigations: rate limiting, CDN/WAF (absorb traffic at the edge), auto-scaling, IP blacklisting.

Rate limiting caps how many requests a client can make per time period. The token bucket algorithm works like this:

Token Bucket Algorithm

Bucket capacity: 10 tokens | Refill rate: 2 tokens/second

Request arrives → take 1 token from bucket
Bucket empty? → reject request (429 Too Many Requests)
Bucket refills at steady rate → allows short bursts up to capacity

Zero Trust: A security model that assumes breach. Don't trust anything just because it's "inside the network." Principles: authenticate every request, enforce least privilege access, encrypt everything (even internal traffic), log everything, and continuously verify. Critical for enterprise systems with complex networks and many employees.

4.8 IDOR (Insecure Direct Object Reference)

A user changes /documents/1042 to /documents/1043 in the URL. Does your API check that document 1043 belongs to their tenant and that they have read permission? If not, you have an IDOR vulnerability — one of the OWASP top 10, and especially dangerous in multi-tenant systems where it can leak data across tenants.

The fix is a centralized authorization check that runs before every data access:

async function assertCanAccess(user, resourceType, resourceId, action) {
  const resource = await db.findResource(resourceType, resourceId);

  // Check 1: Tenant isolation — never cross tenant boundaries
  if (resource.tenantId !== user.tenantId)
    throw new ForbiddenError('Cross-tenant access denied');

  // Check 2: Role-based permission (RBAC)
  if (!await rbac.check(user.role, resourceType, action))
    throw new ForbiddenError('Insufficient role permissions');

  // Check 3: Resource-level ownership
  if (action === 'delete' && resource.ownerId !== user.id && user.role !== 'admin')
    throw new ForbiddenError('Can only delete your own resources');

  return resource;
}

// Every route handler uses it
app.delete('/documents/:id', auth, async (req, res) => {
  const doc = await assertCanAccess(req.user, 'document', req.params.id, 'delete');
  await deleteDocument(doc.id);
});

The pattern checks three layers: tenant isolation → RBAC → resource ownership. Centralizing it means every endpoint gets the same protection — no developer can accidentally skip an access check.

The LinkPulse Story

Chapter 4: Month Four

100,000 users. You wake up to an urgent email: "Someone is posting links from my account." You dig into the logs. Your session system stores a user ID in a cookie — an unsigned cookie. The attacker didn't hack anything. They just edited their cookie from user_id=12847 to user_id=12846 and became someone else. You were trusting the client to tell the truth about who they were.

You rip out the unsigned cookie system and implement JWT with proper cryptographic signing. Now the server signs each token with a secret key. If anyone tampers with the payload, the signature check fails and the request is rejected. You also like that JWTs work for the mobile app you're building — mobile doesn't have browser cookies, but it can send a JWT in the Authorization header just fine.

Uh Oh

A security researcher emails you: they typed ' OR 1=1 -- into the search bar and got back every link in the database — including private ones. Your search endpoint was building SQL queries with string concatenation: "SELECT * FROM links WHERE title LIKE '%" + query + "%'". Classic SQL injection. The attacker didn't need a password. They needed a single quote and basic SQL knowledge.

You audit every database query in the codebase. String concatenation becomes parameterized queries everywhere: WHERE title LIKE $1 with the search term passed as a parameter. The database treats it as data, never as executable SQL. You also add input validation on the API layer as a second line of defense.

A month later, you launch a Teams feature — companies can create shared workspaces. An intern at Company A navigates to a link that belongs to Company B, and the page loads. No error, no access denied — your API checked authentication (are they logged in?) but not authorization (do they have permission to see this?). You add RBAC: every team member has a role — admin, editor, or viewer — and every API endpoint checks the role before returning data.

Uh Oh

A developer writes a quick analytics query for the admin dashboard and forgets a WHERE tenant_id = ? clause. The dashboard now shows aggregated data from all companies mixed together. One missing filter, and tenant isolation is broken. This isn't a hack — it's a bug that any developer can introduce on any query.

You implement Row-Level Security in PostgreSQL. Policies on each table automatically filter rows by tenant_id. Your middleware sets the tenant context on every database connection. Now even if a developer forgets the filter, the database itself refuses to return rows from other tenants. Defense in depth — the database enforces what the application might forget.

The final test comes on a Monday morning: a DDoS attack. 100,000 requests per second, all hitting the login endpoint. Your servers buckle. You implement rate limiting with a token bucket algorithm — each IP gets 20 requests per minute to the login endpoint. Excess requests get a 429 Too Many Requests response. You also put Cloudflare's WAF in front of everything, blocking obviously malicious patterns at the edge before they even reach your servers.

Analogy

Security isn't a feature you bolt on at the end. It's the locks on every door, the ID check at every entrance, the camera in every hallway. LinkPulse didn't get hacked by geniuses — you left the front door open with unsigned cookies, built a SQL injection target with string concatenation, and forgot to check permissions. Every vulnerability was a missing basic.

Key Takeaway

Every vulnerability LinkPulse hit was a consequence of not thinking about security from the start. Unsigned cookies, raw SQL, missing authorization, no tenant isolation — none of these were sophisticated attacks. They were missing fundamentals. Mentioning security proactively shows the difference between thinking like a junior developer and thinking like a senior engineer.

Part 5

System Design Walkthroughs

Nine common system design challenges, each with a detailed walkthrough of the key decisions and architecture.

URL Shortening Service

Design a URL shortening service like bit.ly. Generate short unique URLs and redirect at massive scale. Expect ~100:1 read-to-write ratio.

Key Decisions

Base62An encoding scheme using 62 characters: a-z, A-Z, 0-9. A 7-character base62 string can represent ~3.5 trillion unique values — more than enough for a URL shortener. encoding: Convert an auto-incrementing ID to a short string. ID 12345 → dnh. This guarantees uniqueness (IDs are unique) and produces short codes.

Why not hash the URL? You'd need collision handling. Auto-increment ID → base62 is simpler and guarantees uniqueness.

Two users shorten the same URL? Design choice: (1) separate short codes per user (simpler, unique analytics) or (2) deduplicate (saves storage, shared analytics). Most services choose option 1.

Redirect Flow

Critical tech decision → Redis: Caching IS the design. A 100:1 read ratio means almost every request hits the cache. Without Redis, you'd need 100× the database capacity.

Real-Time Chat System

Design a real-time chat system like Slack or WhatsApp. Support 1:1 messaging, group chats, online presence indicators, and message history. Target 500K DAU.

Message Delivery

WebSocketA persistent, bidirectional connection between client and server. Unlike HTTP (request-response), either side can send messages at any time. See Section 2.2. servers handle real-time delivery. When User A sends "Hello" to User B:

User A's client sends the message via their WebSocket connection
Server persists the message to PostgreSQL/Cassandra
Server publishes event to Kafka
User B's WebSocket server receives the event and pushes it to User B's connection

Presence

Tracked via Redis. Each connected client sends a heartbeatA periodic signal ("I'm still here") sent from the client to the server. If the server stops receiving heartbeats, it marks the user as offline. every 30 seconds. Redis stores user:42:last_seen → timestamp with a TTL. If the TTL expires (no heartbeat), user is shown offline.

Group sizing strategy: Small groups (<100): push messages via WebSocket to all members. Large groups (>100): pull model — client fetches messages on demand.

Critical tech decision → WebSockets: Real-time IS the product. Without persistent connections, you'd fall back to polling, which adds latency and wastes bandwidth.

Social Media News Feed

Design a social media news feed like Twitter or Instagram. When a user opens their feed, efficiently assemble posts from everyone they follow. Handle the "celebrity problem" — users with millions of followers.

Fan-out Strategies

Fan-out-on-write (push model): When User A posts, immediately write that post to every follower's precomputed feed table. Reads are instant, but writes are expensive — 1M followers = 1M writes.

Fan-out-on-read (pull model): When User B opens their feed, query all accounts they follow and assemble on the fly. Writes are cheap, but reads are slow.

Hybrid approach (what Twitter/Instagram actually use): Normal users (few followers) use fan-out-on-write — pre-compute feeds for fast reads. Celebrities (millions of followers) use fan-out-on-read — their posts are fetched on demand and merged. This limits write amplification while keeping reads fast for most users.

Critical tech decision → Kafka: Fan-out IS the hard problem. You need durable event processing to handle writing to millions of follower feeds without losing messages.

File Storage & Sync Service

Design a file storage and sync service like Google Drive or Dropbox. Users upload/download files, sync across devices, and share with others. Handle files from KB to multi-GB.

Architecture

Metadata in PostgreSQL, files in S3. The key insight: don't store files as single blobs.

Chunking: Split each file into ~4MB blocks. Why?

Efficient sync: If one byte in a 1GB file changes, you only re-upload the affected 4MB chunk, not the entire file.
Resumability: If upload fails, resume from the last incomplete chunk.
Parallelism: Multiple chunks can upload simultaneously.

Deduplication via hashing: Each chunk is hashed (SHA-256). Before storing a chunk, check if that hash already exists. If it does, just reference the existing chunk. 100 users upload the same PDF → stored once with 100 references.

Critical tech decision → S3: Infinite, cheap storage IS the requirement. You can't store petabytes of user files on application servers or in a relational database.

Distributed Rate Limiter

Design a distributed rate limiter. Protect APIs from abuse by limiting requests per user, per IP, or per endpoint. Must work across multiple server instances.

Algorithms

Algorithm	How It Works	Best For
Token Bucket	Tokens added at steady rate, each request consumes one. Bucket has max capacity.	Allowing short bursts above the steady rate
Sliding Window Log	Store timestamp of each request. Count requests in the last N seconds.	Precise rate counting, but memory-heavy
Sliding Window Counter	Hybrid: split time into fixed windows, weight current + previous window by overlap.	Balance of precision and memory efficiency
Fixed Window	Count requests per fixed time window (e.g., per minute). Reset at boundary.	Simple, but allows burst at window edges

Distributed Rate Limiting

A single-server counter doesn't work when you have N API servers. Solutions:

Centralized Redis counter: All servers increment the same Redis key (INCR rate:{user_id}:{minute} with TTL). Simple, but adds a Redis round-trip to every request (~1ms).
Per-server local counters + sync: Each server tracks locally and periodically syncs to Redis. Less accurate but faster — good when approximate limiting is acceptable.

Multi-level limiting: Apply different limits at different scopes — per-IP (prevent DDoS), per-user (prevent abuse), per-endpoint (protect expensive operations like /search). Return 429 Too Many Requests with a Retry-After header.

Critical tech decision → Redis: Atomic INCR with TTL provides the distributed counter you need. Fast enough to check on every request without meaningful latency impact.

Notification System

Design a notification system supporting push, email, SMS, and in-app channels. Users can configure preferences. Must handle millions of notifications/day with delivery guarantees.

Architecture

Event-driven design with Kafka at the center. When an event occurs (new message, order shipped, friend request):

Event producer publishes to Kafka topic (e.g., notification.events)
Notification service consumes the event, checks user preferences in Redis/PostgreSQL
Router fans out to the appropriate channel workers:

Push: Firebase Cloud Messaging (FCM) for Android, APNs for iOS
Email: Amazon SES or SendGrid
SMS: Twilio
In-app: WebSocket push or write to notification table for polling

Key Design Decisions

Delivery guarantees: Use Kafka consumer groups with at-least-once delivery. Each channel worker acknowledges after successful send. Failed sends go to a dead-letter queue for retry.

User preferences: Store in PostgreSQL, cache in Redis. Schema: user_id, channel, event_type, enabled, quiet_hours_start, quiet_hours_end. Check before every send.

Rate limiting per user: Don't spam — aggregate similar notifications ("You have 5 new messages" instead of 5 separate pushes). Use a short delay window (30s) to batch.

Critical tech decision → Kafka: Decouples event producers from notification delivery. Handles backpressure, retries, and multiple consumers (one per channel) naturally.

Search Autocomplete

Design a search autocomplete (typeahead) system. As the user types, suggest the top completions in under 100ms. Handle billions of search queries for ranking data.

Data Structure: Trie

A trie (prefix tree) stores strings character by character. Each node represents a prefix; children represent the next character. To find completions for "sys", traverse to the "s" → "y" → "s" node and return all descendants.

Precomputed Top-K

At each trie node, store the top K results (e.g., top 10 completions ranked by search frequency). This avoids traversing the entire subtree at query time — just return the precomputed list. Trade-off: more storage, but queries are O(prefix length) instead of O(subtree size).

Offline Pipeline

Ranking data comes from an offline analytics pipeline:

Collect search query logs → aggregate frequency counts (MapReduce/Spark job, runs hourly or daily)
Build/update the trie with new frequency data
Serialize the trie and push to the serving layer

Serving

CDN caching: Cache the top completions for common prefixes (1-2 characters) at the CDN edge. "s", "sy", "sys" are queried millions of times — serve from CDN, not your servers.
In-memory trie servers: For longer/rarer prefixes, query a cluster of servers holding the trie in RAM.
Client-side optimization: Debounce keystrokes (wait 100-200ms after the user stops typing before querying). Cache recent responses locally.

Critical tech decision → Precomputed trie with CDN caching: Sub-100ms latency requires avoiding any database lookup on the hot path. The trie is the data structure; CDN + in-memory serving is the delivery mechanism.

Web Crawler

Design a web crawler that indexes billions of pages. Handle politeness (don't overload sites), deduplication (don't re-crawl the same page), and distributed crawling across many machines.

Core Components

URL Frontier: A priority queue of URLs to crawl. Organized by domain with per-domain rate limiting. High-priority pages (news sites, frequently updated) are crawled more often.

Crawl Loop

Pick URL from the frontier (respecting per-domain delays)
Check robots.txt — honor the site's crawling rules. Cache robots.txt per domain.
Fetch the page via HTTP. Handle redirects, timeouts, and errors gracefully.
Parse HTML — extract text content for indexing + extract outgoing links for the frontier.
Deduplicate: Before adding to the index, check if we've seen this content before.
Store the parsed content in the search index. Add new URLs back to the frontier.

Deduplication

URL dedup: Use a Bloom filter — a probabilistic data structure that tells you "definitely not seen" or "probably seen." Memory-efficient for billions of URLs. False positives (skip a new URL) are acceptable; false negatives (re-crawl) are not.
Content dedup: Use SimHash or MinHash to detect near-duplicate pages (e.g., same article with different ad placements). Compute a content fingerprint and compare against existing fingerprints.

Distributed Crawling

Use consistent hashing by domain to assign each domain to a specific crawler node. This ensures per-domain rate limiting is enforced by a single machine (no coordination needed) and improves cache locality for robots.txt and DNS lookups.

Critical tech decision → URL frontier design: The frontier determines crawl order, politeness, and freshness. A poorly designed frontier either overloads target sites (getting you blocked) or wastes time on low-value pages.

Video Streaming Platform

Design a video streaming platform like YouTube or Netflix. Handle video upload, transcoding to multiple resolutions, and adaptive bitrate streaming to millions of concurrent viewers.

Upload → Transcode → Serve Pipeline

Upload: Client uploads the original video to S3 (chunked upload for large files, resumable). Metadata (title, description, uploader) stored in PostgreSQL.
Transcode: A message is published to a queue (SQS/Kafka). Worker fleet transcodes into multiple resolutions (1080p, 720p, 480p, 360p) and formats. Each resolution is split into small segments (2-10 seconds each).
Store: Transcoded segments stored in S3, organized by video ID and resolution.
Serve: CDN caches and delivers video segments to viewers globally.

Adaptive Bitrate Streaming (HLS/DASH)

The client requests a manifest file that lists all available resolutions and their segment URLs. The video player monitors network bandwidth in real-time and switches between resolutions seamlessly:

Fast WiFi → stream 1080p segments
Bandwidth drops → seamlessly switch to 480p segments
Bandwidth recovers → switch back to 1080p

This happens segment-by-segment (every 2-10 seconds), so the user never waits for buffering — they just see slightly lower quality during slow periods.

Scale Considerations

Storage: A 10-minute video in 4 resolutions ≈ 2-4GB. At 500K uploads/day, that's ~1-2 PB/year. S3's durability and cost structure handle this.
CDN is critical: Popular videos ("head" content) should be cached at every CDN edge. Long-tail content is fetched from origin on demand.
Transcoding fleet: Use spot/preemptible instances — transcoding is embarrassingly parallel and fault-tolerant (just restart the segment).

Critical tech decision → CDN + segment-based delivery: Streaming IS delivery. Without a global CDN caching video segments at the edge, you'd need impossible origin bandwidth. Segment-based architecture enables adaptive bitrate and CDN caching simultaneously.

Critical Tech Decision Summary

Each system design challenge has a single most important technology choice. The critical decision is the one the core challenge depends on:

URL Shortener → Redis: Caching IS the design (100:1 read ratio).
Chat System → WebSockets: Real-time IS the product.
News Feed → Kafka: Fan-out IS the hard problem.
File Storage → S3: Infinite, cheap storage IS the requirement.
Rate Limiter → Redis: Distributed atomic counters with TTL.
Notifications → Kafka: Decouples events from multi-channel delivery.
Autocomplete → Precomputed trie + CDN: Sub-100ms requires no DB on the hot path.
Web Crawler → URL frontier: Controls politeness, priority, and freshness.
Video Streaming → CDN + segments: Delivery IS the product.

Part 6

Technology Cheat Sheet

Technologies ranked by popularity within each category. Each tool includes pros, cons, and when to use it. The #1 in each category is the standard default — know when you'd pick an alternative.

Relational Databases

#1 PostgreSQL

Open-source RDBMS with JSONB support, full-text search, row-level security, and a rich extension ecosystem. The default choice for most applications.

Pros

JSONB for flexible schemas
Advanced indexing (GIN, GiST, BRIN)
Strong community & ecosystem

Cons

Write-heavy sharding is complex
Vertical scaling ceiling

Use When

Relational data with ACID needs
Complex queries & JOINs
Default choice for most apps

#2 MySQL

Mature RDBMS with strong replication. Simpler than PostgreSQL, widely used in PHP/WordPress ecosystems and legacy systems.

Pros

Simple replication setup
Massive ecosystem & hosting support

Cons

Fewer advanced features than PG
Weaker JSON/full-text support

Use When

Legacy PHP/WordPress apps
Simple CRUD applications

#3 CockroachDB

Distributed SQL database that scales horizontally while maintaining strong consistency and PostgreSQL wire compatibility.

Pros

Horizontal scaling with SQL
Multi-region by default
PostgreSQL compatible

Cons

Higher latency than single-node PG
Expensive at scale

Use When

Need SQL + global distribution
Outgrowing single-node PostgreSQL

Document Databases

#1 MongoDB

Flexible JSON-like documents with dynamic schemas. Strong developer experience and aggregation pipeline.

Pros

Schema flexibility
Rich query language
Good for rapid prototyping

Cons

No ACID across documents (until v4.0 multi-doc transactions)
JOINs are expensive

Use When

Semi-structured/varying schemas
Content management, catalogs

#2 DynamoDB

Fully managed, serverless key-value/document store by AWS. Predictable single-digit ms latency at any scale.

Pros

Zero ops, auto-scaling
Predictable performance
Built-in global tables

Cons

Limited query patterns (partition key required)
Expensive at high volume
AWS lock-in

Use When

Simple key-value access patterns
Serverless architectures on AWS

Cache

#1 Redis

In-memory data structure store supporting strings, lists, sets, sorted sets, hashes, streams, and pub/sub. The industry standard cache.

Pros

Rich data structures
Persistence options (RDB/AOF)
Pub/sub, Lua scripting, streams

Cons

Single-threaded (for commands)
RAM-limited (expensive at scale)

Use When

Caching, sessions, rate limiting
Leaderboards, real-time counters
Default cache choice

#2 Memcached

Simple, multi-threaded in-memory key-value cache. No data structures, no persistence.

Pros

Multi-threaded (better on multi-core)
Simpler, less overhead

Cons

No persistence
No data structures
No pub/sub

Use When

Pure key-value caching only
Multi-threaded workloads

Message Queue / Streaming

#1 Kafka

Distributed event streaming platform. Durable, replayable log with multi-consumer support. High throughput for event-driven architectures.

Pros

Event replay & audit trail
Multiple consumers per topic
Extremely high throughput

Cons

Operational complexity
Overkill for simple queues
Higher latency than SQS/RabbitMQ

Use When

Event-driven architectures
Multiple services consuming same events
Data pipelines, analytics

#2 SQS

Fully managed message queue by AWS. Zero ops, auto-scaling, pay-per-message. FIFO mode supports exactly-once processing.

Pros

Zero ops, fully managed
FIFO + deduplication
Dead-letter queues built-in

Cons

No event replay
One consumer per message
AWS lock-in

Use When

Simple job/task queues
Process-each-once workloads

#3 RabbitMQ

Open-source message broker with advanced routing, priority queues, and message acknowledgment patterns.

Pros

Complex routing patterns
Priority queues
Low latency delivery

Cons

Doesn't scale like Kafka
No built-in event replay

Use When

Complex routing logic needed
Priority-based processing

Search

#1 Elasticsearch

Distributed full-text search engine built on Lucene. Inverted indexes, fuzzy matching, faceted search, relevance tuning.

Pros

Powerful full-text search
Horizontal scaling
Rich analytics (ELK stack)

Cons

Resource-heavy (RAM-hungry)
Complex to operate at scale

Use When

Search IS the feature
10M+ documents
Fuzzy/faceted search needs

#2 PostgreSQL Full-Text

Built-in tsvector/tsquery with GIN indexes. No extra infrastructure needed.

Pros

No extra infrastructure
SQL integration
Good enough for <1M docs

Cons

Limited fuzzy/relevance features
Doesn't scale horizontally

Use When

Search is a minor feature
Small dataset, simple queries

#3 Algolia

Hosted search-as-a-service with instant results. Sub-50ms latency, typo tolerance, and great developer UX out of the box.

Pros

Zero ops, instant setup
Built-in typo tolerance
Great frontend SDKs

Cons

Expensive at scale
Less customizable than ES

Use When

Need great search fast
Small-to-medium dataset

#4 Meilisearch / Typesense

Open-source, lightweight search engines. Easy to self-host with good defaults. Good Algolia alternatives.

Pros

Easy to deploy & operate
Typo tolerance built-in
Open-source (free)

Cons

Less mature than ES
Smaller community

Use When

Want Algolia-like UX, self-hosted
Moderate scale

Object Storage

#1 AWS S3

Unlimited blob storage with 11 nines (99.999999999%) durability. The industry standard for file/object storage.

Pros

Near-infinite scale
Lifecycle policies, versioning
Deep AWS integration

Cons

Egress costs add up
AWS lock-in

Use When

Images, videos, backups, static assets
Default choice for file storage

#2 Azure Blob Storage

Microsoft's equivalent to S3. Tight integration with Azure ecosystem and Active Directory.

Pros

Azure ecosystem integration
Tiered storage (hot/cool/archive)

Cons

Azure lock-in

Use When

Already on Azure

#3 Google Cloud Storage

Google's object storage with strong integration into BigQuery and ML services.

Pros

BigQuery integration
Multi-region by default

Cons

GCP lock-in

Use When

Already on GCP
ML/analytics pipelines

Load Balancer

#1 AWS ALB

Managed Layer 7 load balancer with path-based routing, WebSocket support, and auto-scaling.

Pros

Zero ops, auto-scaling
Path/host-based routing
Native AWS integration

Cons

AWS-only
Less configurable than Nginx

Use When

AWS infrastructure
Default choice on AWS

#2 Nginx

High-performance web server and reverse proxy. Can also serve static files, terminate SSL, and cache responses.

Pros

Extremely configurable
Cloud-agnostic
Also serves static content

Cons

Self-managed
Config can be complex

Use When

Self-hosted / multi-cloud
Need fine-grained control

#3 HAProxy

High-performance TCP/HTTP load balancer. Industry standard for extreme throughput requirements.

Pros

Very high performance
Advanced health checks

Cons

Steeper learning curve
No static file serving

Use When

Maximum throughput needed
Complex health check requirements

CDN

#1 Cloudflare

Global CDN with built-in WAF, DDoS protection, DNS, and Workers (edge compute). Generous free tier.

Pros

Free tier, easy setup
Built-in security (WAF, DDoS)
Edge compute (Workers)

Cons

Less AWS-native
Enterprise features are pricey

Use When

Default CDN choice
Need WAF + CDN in one

#2 CloudFront

AWS's CDN. Deep integration with S3, ALB, Lambda@Edge for origin-less processing.

Pros

Native S3/ALB integration
Lambda@Edge for custom logic

Cons

AWS lock-in
Config is more complex

Use When

All-AWS stack
Need Lambda@Edge

#3 Akamai

Enterprise CDN with the largest global network. Used by Fortune 500 for mission-critical delivery.

Pros

Largest edge network
Enterprise SLAs

Cons

Expensive
Complex contracts

Use When

Enterprise / Fortune 500
Need guaranteed global SLAs

Monitoring

#1 Datadog

Full-stack observability: logs, metrics, traces, APM, and dashboards in one platform.

Pros

All-in-one platform
Great integrations
Easy setup

Cons

Expensive at scale
Vendor lock-in on queries

Use When

Want one tool for everything
Team velocity matters more than cost

#2 Prometheus + Grafana

Open-source metrics collection (Prometheus) + visualization (Grafana). The Kubernetes-native monitoring stack.

Pros

Free, open-source
Kubernetes-native
PromQL is powerful

Cons

Self-managed infrastructure
No built-in log aggregation

Use When

Cost-sensitive
Already on Kubernetes

#3 New Relic

Full-stack observability with generous free tier. Strong APM and distributed tracing capabilities.

Pros

Generous free tier (100GB/mo)
Strong APM

Cons

Per-user pricing adds up
Query language learning curve

Use When

Small team, budget-conscious
Need APM focus

Containers / Orchestration

#1 Kubernetes

Container orchestration for deploying, scaling, and managing containerized applications. The industry standard for microservices at scale.

Pros

Cloud-agnostic
Auto-scaling, self-healing
Massive ecosystem

Cons

Steep learning curve
Operational overhead
Overkill for simple apps

Use When

Microservices at scale
Multi-cloud / hybrid

#2 ECS / Fargate

AWS's container orchestration. ECS manages containers; Fargate removes the need to manage servers entirely.

Pros

Simpler than K8s on AWS
Fargate = serverless containers
Native AWS integration

Cons

AWS lock-in
Less flexible than K8s

Use When

All-AWS, smaller team
Don't want K8s complexity

#3 Docker Compose

Multi-container orchestration for development and simple deployments. Define services in a YAML file.

Pros

Dead simple
Great for dev/staging

Cons

Single-host only
No auto-scaling or self-healing

Use When

Local development
Small single-server deployments

Auth

#1 Auth0 / Okta

Managed identity platform with SSO, MFA, OAuth/OIDC, social logins, and user management. Auth0 was acquired by Okta.

Pros

Full-featured (SSO, MFA, social)
Great SDKs
Enterprise-grade security

Cons

Expensive at scale
Vendor lock-in

Use When

Enterprise SSO/MFA needed
Don't want to build auth

#2 Firebase Auth

Google's auth service. Easy social login, email/password, phone auth. Great for mobile apps.

Pros

Free for most use cases
Great mobile SDKs
Easy setup

Cons

Limited enterprise features
Google lock-in

Use When

Mobile apps, MVPs
Simple auth needs

#3 Cognito

AWS's managed auth service. User pools for authentication, identity pools for AWS resource access.

Pros

AWS-native integration
Generous free tier (50K MAU)

Cons

Poor developer experience
Limited customization

Use When

All-AWS serverless stack
Need AWS resource access control

Real-time

#1 WebSockets + Redis Pub/Sub

Persistent bidirectional connections (WebSockets) with cross-server broadcasting (Redis Pub/Sub). The standard real-time architecture.

Pros

True bidirectional
Low latency
Cross-server broadcast via Redis

Cons

Stateful connections (harder to scale)
Need sticky sessions or Pub/Sub

Use When

Chat, gaming, collaboration
Either side initiates messages

#2 Server-Sent Events (SSE)

Server pushes data to client over a standard HTTP connection. Simpler than WebSockets for one-way updates.

Pros

Simple (just HTTP)
Auto-reconnect built in
Works through proxies/firewalls

Cons

Server-to-client only
No binary data

Use When

Live feeds, notifications
Server-to-client push only

Graph Databases

#1 Neo4j

Native graph database. Nodes and edges (relationships) as first-class citizens. Cypher query language.

Pros

Fast graph traversals
Intuitive Cypher query language
Great visualization tools

Cons

Doesn't scale horizontally easily
Niche use cases

Use When

Social networks, recommendations
Fraud detection, knowledge graphs

#2 Amazon Neptune

Managed graph database by AWS. Supports both property graph (Gremlin) and RDF (SPARQL) models.

Pros

Fully managed
Multi-model (property + RDF)

Cons

AWS lock-in
Expensive

Use When

AWS-native graph needs
Knowledge graphs, identity graphs

Vector Databases

#1 Pinecone

Managed vector database purpose-built for similarity search. Core infrastructure for RAG (Retrieval-Augmented Generation) and AI applications.

Pros

Zero ops, fully managed
Fast similarity search
Metadata filtering

Cons

Expensive at scale
Vendor lock-in

Use When

RAG applications
Semantic search, recommendations

#2 pgvector

PostgreSQL extension for vector similarity search. Store embeddings alongside relational data in a single database.

Pros

No extra infrastructure
SQL + vectors together
Free, open-source

Cons

Slower than dedicated vector DBs
Scaling limitations

Use When

Small-to-medium vector workloads
Already using PostgreSQL

#3 Weaviate / Qdrant

Open-source vector databases with rich filtering, hybrid search (vector + keyword), and self-hosting options.

Pros

Open-source, self-hostable
Hybrid search
Active development

Cons

Newer, less battle-tested
Self-hosting complexity

Use When

Need vector DB without vendor lock-in
Want hybrid keyword + vector search

Workflow Orchestration

#1 Temporal

Durable execution platform for long-running workflows. Code-first approach — write workflows as regular functions.

Pros

Code-first (no YAML/JSON DSL)
Built-in retries, timeouts, versioning
Durable execution

Cons

Operational complexity
Learning curve

Use When

Complex, multi-step business processes
Long-running workflows (hours/days)

#2 AWS Step Functions

Managed workflow service. Visual state machine with JSON-based definition. Integrates natively with Lambda, SQS, DynamoDB.

Pros

Zero ops, fully managed
Visual workflow designer
Native AWS integration

Cons

JSON DSL is verbose
AWS lock-in
Expensive at high volume

Use When

AWS-native orchestration
Serverless workflows

#3 Apache Airflow

Python-based DAG scheduler. The standard for data pipeline orchestration (ETL/ELT).

Pros

Python-native
Huge plugin ecosystem
Great for data pipelines

Cons

Not designed for real-time
Scheduler bottleneck at scale

Use When

Data/ML pipelines
Scheduled batch processing

Time-Series DB

#1 InfluxDB

Purpose-built for time-stamped data. Optimized for high-frequency writes with automatic downsampling and retention policies.

Pros

Built for metrics/IoT
Auto downsampling & retention
High write throughput

Cons

Custom query language (Flux)
Not SQL-compatible

Use When

Monitoring, IoT sensors
High-frequency time-stamped data

#2 TimescaleDB

PostgreSQL extension for time-series. Full SQL compatibility with time-series optimizations (hypertables, compression).

Pros

Full SQL support
Runs on PostgreSQL
JOINs with relational data

Cons

Less optimized than InfluxDB for pure metrics

Use When

Need SQL + time-series
Already using PostgreSQL

API Gateway

#1 Kong

Open-source API gateway with plugins for auth, rate limiting, logging, and transformation. Cloud-agnostic.

Pros

Rich plugin ecosystem
Cloud-agnostic
Open-source core

Cons

Self-managed (unless Kong Cloud)
Learning curve

Use When

Microservices with complex API needs
Multi-cloud environments

#2 AWS API Gateway

Managed API gateway by AWS. REST and WebSocket APIs. Native Lambda integration for serverless backends.

Pros

Zero ops
Native Lambda integration
Usage plans & API keys

Cons

AWS lock-in
Latency overhead
Expensive at high volume

Use When

Serverless AWS backends
Public API with usage plans

Serverless

#1 AWS Lambda

Event-driven functions. Pay-per-invocation, auto-scales to thousands of concurrent executions. 15-min max runtime.

Pros

Zero server management
Pay only for usage
Scales automatically

Cons

Cold starts (100ms–2s)
15-min timeout
Vendor lock-in

Use When

Event-driven processing
Bursty, unpredictable traffic

#2 Azure Functions

Microsoft's serverless platform. Supports Durable Functions for long-running workflows with state management.

Pros

Durable Functions (orchestration)
Azure ecosystem integration

Cons

Azure lock-in

Use When

Azure ecosystem
Need durable/stateful functions

#3 Google Cloud Functions

Google's serverless platform. Tight integration with Firebase, Pub/Sub, and Cloud Run.

Pros

Firebase integration
Cloud Run for containers

Cons

GCP lock-in
Smaller ecosystem than Lambda

Use When

GCP/Firebase ecosystem

When Scale Changes the Answer

The #1 in each category is the right default — but at certain scale thresholds, you need specialized tools. A common question is "why not just use PostgreSQL for X?"

Search: PostgreSQL Full-Text vs Elasticsearch

PostgreSQL's built-in tsvector and tsquery work well for <1M documents with simple keyword search. At 10M+ documents, Elasticsearch wins because:

Inverted indexes are purpose-built for full-text search (faster than PG's GIN indexes at scale)
Horizontal scaling — shard the search index across nodes
Rich features — fuzzy matching, faceted search, relevance tuning, synonyms, autocomplete

Rule of thumb: If search is a minor feature (e.g., searching your own notes), PostgreSQL is fine. If search IS the feature (e.g., a product catalog, documentation site), use Elasticsearch.

Time-Series: PostgreSQL vs Specialized Stores

PostgreSQL can store timestamped data, but at high-frequency writes (metrics, IoT sensors) it struggles because:

Standard INSERT overhead on B-tree indexes doesn't suit append-heavy workloads
No built-in downsampling (aggregating old data: per-second → per-minute → per-hour)
No automatic retention policies (purging data older than 90 days)

TimescaleDB is a PostgreSQL extension that adds time-series optimizations while keeping full SQL compatibility — a good middle ground. InfluxDB is purpose-built for metrics and monitoring with its own query language.

Decision framework: "At [current scale], [default tool] works because [reason]. At [10× scale], we'd need [alternative] because [specific limitation]. The migration trigger would be [concrete metric]."

Part 7

Glossary

Quick-fire definitions for key system design terms used throughout this guide.

Term	Definition
Microservices	Each feature is its own deployable service
Monolith	Single deployable unit containing all features
API Gateway	Single entry point that routes, authenticates, and rate-limits API traffic
Circuit Breaker	Stops calling a failing service to let it recover. 3 states: Closed (normal — requests pass through), Open (tripped — requests immediately fail without calling the service), Half-Open (testing — allows one request through to check if the service has recovered).
Saga Pattern	Sequence of local transactions across services with compensating actions on failure. E-commerce example: (1) reserve inventory → (2) charge payment → (3) ship order. If step 3 fails: compensate by refunding payment (undo step 2) and releasing inventory (undo step 1). Each step can be rolled back independently.
Idempotency	Safe to retry — same result no matter how many times called
Backpressure	Consumer tells producer to slow down when overwhelmed. Analogy: a sink filling with water faster than it drains — backpressure is turning down the faucet instead of letting the sink overflow. Implementations: return 429 status, use bounded queues, or use reactive streams.
Blue-Green	Two identical environments; switch traffic from old to new instantly
Canary	Roll out new version to small % of users first, then gradually increase
Feature Flag	Toggle to enable/disable features without deploying code
Event Sourcing	Store events (what happened) instead of current state. Bank account example: instead of storing "balance = $500," store every event: [+$1000 deposit, -$200 withdrawal, -$300 payment]. Current balance = replay all events. Benefits: complete audit trail, time-travel debugging, can rebuild state at any point.
CQRS	Separate read and write models/paths. See Part 2.
SLA	Contractual uptime commitment (99.9% = 8.7 hrs downtime/year)
SLO	Internal target — stricter than SLA
P99 Latency	99% of requests complete within this time
Thundering Herd	Many requests hit an expired cache entry simultaneously. See Part 2.
Hot Spot	One shard/server receiving disproportionate traffic. See Part 3.

Concepts Expanded

Several terms in the table above deserve deeper explanation. Here's the depth you need.

Thundering Herd Prevention

When a popular cache entry expires, thousands of requests slam the database simultaneously. Three mitigations:

Cache locking: The first request acquires a short-lived lock and fetches from the DB. All other requests wait for the lock to release and then read the freshly cached value.
Staggered TTLs: Add random jitter to expiration times (e.g., base TTL 5 min + random 0-60 sec). Entries expire at different times, spreading the load.
Background refresh: A background job re-populates the cache before TTL expires, so users never see a cache miss. Best for data that's expensive to compute.

Cross-reference: caching fundamentals in Part 2.

Idempotency Key Pattern

Critical for payment APIs and any operation where retries could cause duplicates. The client generates a UUID and sends it in a header; the server checks before executing:

POST /payments
Headers: Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000

Server logic:
1. Check DB: has this idempotency key been processed?
   ├── YES → return the cached result (no re-execution)
   └── NO  → process the payment
2. Store: idempotency_key → result (with 24hr TTL)
3. Return result to client

This guarantees that network timeouts and retries never cause double-charges.

Circuit Breaker in Practice

A circuit breaker monitors calls to a dependency. If the failure rate exceeds a threshold, it "opens" — subsequent calls return a fallback immediately without actually calling the failing service. The system degrades gracefully rather than collapsing:

const breaker = new CircuitBreaker(fetchFromDependency, {
  timeout:          3000,   // fail after 3s instead of hanging
  errorThresholdPercentage: 50,    // open after 50% failure rate
  resetTimeout:     30000,  // try again after 30s
});

breaker.fallback(() => ({
  data: null,
  message: 'Service temporarily unavailable'
}));

// If the dependency is down, users get a degraded response
// instead of a timeout or crash. The system keeps running.

Cross-reference: circuit breaker definition in the glossary table above.

Saga Pattern vs. Distributed Transactions (2PC)

Two-phase commit (2PC) locks all participants until everyone agrees to commit — strong consistency, but a single slow participant blocks everything and the coordinator is a single point of failure. Sagas break the work into a sequence of independent local transactions, each with a compensating action for rollback. Sagas trade consistency for availability: you get eventual consistency but no global locks. Default to sagas for microservices; reserve 2PC for cases where you truly need atomic cross-service consistency (rare).

Feature Flags — Why Use Them

Instant kill switch: If a new feature causes errors, disable it in seconds — no rollback or redeploy needed.
Gradual rollout: Release to 5% of users → monitor → 25% → 100%. Catches issues before they affect everyone.
A/B testing: Show version A to half your users, version B to the other half. Measure which performs better.
Unblock deploys: Merge code behind a flag even if the feature isn't ready. The flag keeps it invisible until you flip it on.