Core Concepts of System Design

TL;DR

System Design has 8 building blocks: Load Balancers (distribute traffic), Caching (speed up reads), CDNs (serve static content), Databases (store data), Message Queues (decouple services), Microservices (split by domain), API Design (define interfaces), and Monitoring (observe everything).

The Big Picture

Every distributed system is built from these eight components. Master them and you can design anything from a URL shortener to a global chat platform.

Explain Like I'm 12

A system is like a restaurant. The host is the Load Balancer (seats people at different tables). The kitchen has Caches (pre-made appetizers for fast service). The menu is the API (what you can order). The walk-in fridge is the Database. The order tickets are Message Queues. Each station (grill, salad, dessert) is a Microservice. The delivery truck is the CDN. And the manager watching everything is Monitoring.

Cheat Sheet

Component	What It Does	When to Use	Example
Load Balancer	Distributes traffic across servers	Multiple servers, high traffic	Nginx, AWS ALB
Cache	Stores frequently accessed data in memory	Read-heavy workloads	Redis, Memcached
CDN	Serves static files from edge locations	Images, CSS, JS, videos	CloudFront, Cloudflare
Database	Persists data durably	All systems	PostgreSQL, MongoDB
Message Queue	Async communication between services	Decoupling, background jobs	Kafka, RabbitMQ, SQS
Microservices	Independent, deployable services	Large teams, complex domains	Docker + K8s
API Gateway	Single entry point, routing, auth	Multiple microservices	Kong, AWS API GW
Monitoring	Metrics, logs, alerts, tracing	Always (not optional)	Prometheus, Grafana, Datadog

The 8 Building Blocks

Load Balancers

A load balancer sits between clients and your servers. Its job: distribute incoming requests across multiple servers so no single machine gets overwhelmed.

Common algorithms:

Round-robin — Send each request to the next server in sequence. Simple, works when servers are identical.
Least connections — Send to the server with the fewest active connections. Better for uneven workloads.
IP hash — Hash the client's IP to pick a server. Same client always hits the same server (session stickiness).

There are two layers: L4 (transport) load balancers route based on IP/port (faster, simpler). L7 (application) load balancers inspect HTTP headers, URLs, and cookies (smarter routing, SSL termination).

Health checks continuously ping each server. If a server stops responding, the load balancer removes it from the pool. Without a load balancer, you have a single point of failure.

Tip: If you can only remember one thing: load balancers sit between clients and servers, and they prevent any single server from being overwhelmed.

Caching

Caching stores computed results in fast memory so you don't have to recompute or re-fetch them. The difference between a 200ms database query and a 1ms cache hit is the difference between a fast app and a slow one.

Three caching strategies:

Cache-aside — App checks cache first. Cache miss? Fetch from DB, write to cache, return. Most common pattern.
Write-through — Write to cache AND database simultaneously. Consistent but slower writes.
Write-back — Write to cache only, async flush to DB later. Fast writes, risk of data loss if cache crashes.

TTL (Time to Live) sets an expiration on cached data. Cache invalidation — knowing when to remove stale data — is famously "one of the two hard things in computer science."

Redis: In-memory key-value store, supports data structures (lists, sets, sorted sets, hashes). Memcached: Simpler, multi-threaded, pure key-value. Redis is the default choice for most modern systems.

Warning: Cache invalidation bugs cause stale data. Always have a TTL. When in doubt, cache with a short TTL rather than no cache at all.

CDNs (Content Delivery Networks)

A CDN serves static content (images, CSS, JS, videos) from servers geographically close to users. Instead of every request hitting your origin server in Virginia, a user in Tokyo gets content from a Tokyo edge server.

Two types:

Push CDN — You upload content to the CDN proactively. Good for content that doesn't change often.
Pull CDN — CDN fetches from your origin on the first request, then caches it. Simpler to set up, lazy population.

CDNs reduce latency, offload traffic from your origin server, and provide DDoS protection. Every production website should use one. Major providers: Cloudflare, CloudFront (AWS), Akamai, Fastly.

Info: This very site uses Cloudflare as its CDN. Static HTML files are cached at edge locations worldwide, so the page loads fast regardless of where you are.

Databases

Databases are where your data lives permanently. The biggest design decision is choosing the right type for your data model and access patterns.

SQL (Relational) — Tables, rows, joins, ACID transactions. PostgreSQL, MySQL. Choose when data has relationships and you need consistency.
Document — JSON-like flexible schemas. MongoDB. Choose for content, catalogs, and user profiles where schema varies.
Key-Value — Simple get/set by key. Redis, DynamoDB. Choose for caching, sessions, and config.
Wide-Column — Rows with dynamic columns. Cassandra, HBase. Choose for time-series data and high write throughput.
Graph — Nodes and edges. Neo4j. Choose for social networks, recommendations, and relationship-heavy queries.

Info: In interviews, always justify your database choice. "I chose PostgreSQL because the data has relationships and we need ACID transactions" is better than "I like Postgres."

Message Queues

Message queues decouple producers (services that create work) from consumers (services that process work). Instead of Service A calling Service B directly and waiting, Service A drops a message on a queue and moves on. Service B picks it up when it's ready.

Key technologies:

Apache Kafka — Event streaming platform. High throughput, ordered within partitions, durable. Use for event-driven architectures, activity feeds, and log aggregation.
RabbitMQ — Traditional message broker. Flexible routing, multiple protocols, acknowledgments. Use for task queues and RPC patterns.
Amazon SQS — Fully managed queue service. No infrastructure to manage. Use for simple decoupling in AWS environments.

Use message queues for: async processing, buffering traffic spikes, event-driven architecture, and ensuring work gets done even if a consumer is temporarily down.

Tip: If your system design has one service calling another synchronously and waiting, ask yourself: "Does the caller really need the response right now?" If not, a message queue is probably the better pattern.

Microservices

Microservices architecture splits a monolithic application into small, independent services organized by business domain. Each service owns its own data, deploys independently, and communicates via APIs or events.

Benefits:

Independent deployments — Ship the payment service without touching the user service.
Team autonomy — Each team owns a service end-to-end (code, data, deployment).
Technology freedom — One service can use Python, another Go, another Node.js. Pick the best tool for each problem.
Fault isolation — If the recommendation service crashes, the checkout service keeps working.

Costs:

Network complexity — Every service call is a network call. Latency, timeouts, and retries matter.
Distributed debugging — Tracing a request across 10 services is harder than reading one stack trace.
Data consistency — No more single database transactions across services. You need eventual consistency patterns (sagas, outbox).

Warning: Don't start with microservices. Start with a monolith, find the boundaries, then extract services. Premature decomposition creates distributed complexity for no benefit.

API Design

APIs define how services communicate. The three dominant styles:

REST — Resource-based, HTTP verbs (GET, POST, PUT, DELETE). The default for most systems. Simple, well-understood, cacheable.
GraphQL — Client specifies exactly which fields it needs. Reduces over-fetching. Good for mobile clients with bandwidth constraints.
gRPC — Binary protocol (Protocol Buffers), strongly typed, fast. Best for internal service-to-service communication where latency matters.

Regardless of style, good API design includes:

Rate limiting — Prevent abuse and protect backend resources.
Pagination — Never return unbounded lists. Use cursor-based pagination for large datasets.
Versioning — /v1/users vs /v2/users. Let clients migrate at their own pace.
Idempotency — Retrying the same request produces the same result. Critical for payment APIs.

Info: REST is the safe default for interviews. If the interviewer asks "how do your services communicate?" start with REST and justify switching to gRPC or GraphQL only if the problem demands it.

Monitoring & Observability

You can't fix what you can't see. Monitoring tells you when something is wrong. Observability tells you why. The three pillars:

Metrics — Numbers over time. Request count, latency (p50/p95/p99), error rate, CPU usage, memory. Tools: Prometheus, CloudWatch, Datadog.
Logs — Text records of what happened. Structured JSON logs are searchable; unstructured plain text is not. Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Loki.
Traces — Follow a single request as it flows across services. See where time is spent, where errors occur. Tools: Jaeger, Zipkin, AWS X-Ray.

Alerts trigger when metrics cross thresholds (e.g., error rate > 1%, p99 latency > 500ms). Good alerts are actionable — they tell you what's wrong and what to do.

Tip: In every system design answer, end with "and here's how I'd monitor it." Interviewers love this. It shows you think about production operations, not just architecture.

Test Yourself

What's the difference between L4 and L7 load balancers?

L4 (transport layer) load balancers route based on IP address and port number. They're faster because they don't inspect packet contents. L7 (application layer) load balancers inspect HTTP headers, URLs, cookies, and request bodies. They can make smarter routing decisions (e.g., route /api to one server pool and /static to another) and handle SSL termination.

Name the three caching strategies and when to use each.

(1) Cache-aside: App checks cache, falls back to DB on miss, writes to cache. Best for read-heavy workloads (most common). (2) Write-through: Writes go to cache AND DB simultaneously. Best when you need strong consistency between cache and DB. (3) Write-back: Writes go to cache only, DB is updated asynchronously. Best for write-heavy workloads where you can tolerate data loss risk.

When would you choose a NoSQL database over SQL?

Choose NoSQL when: (1) Your data doesn't have rigid relationships (documents, key-value pairs). (2) You need horizontal scaling across many servers. (3) Your schema changes frequently. (4) You need very high write throughput (wide-column stores like Cassandra). (5) You're modeling relationships as graphs (Neo4j). Choose SQL when you need ACID transactions, complex joins, and data integrity.

What are the three pillars of observability?

The three pillars are: (1) Metrics — numerical measurements over time (request count, latency percentiles, error rates). (2) Logs — text records of events (structured JSON logs are searchable and queryable). (3) Traces — follow a single request as it flows across multiple services, showing where time is spent and where failures occur.

What's the difference between REST, GraphQL, and gRPC?

REST: Resource-based, uses HTTP verbs, returns fixed response shapes. Simple, cacheable, the default for public APIs. GraphQL: Client specifies exactly which fields it wants, reducing over/under-fetching. Good for mobile and complex UIs with varying data needs. gRPC: Binary protocol (Protocol Buffers), strongly typed, very fast. Best for internal service-to-service communication where low latency matters.

Core Concepts of System Design

The Big Picture

Cheat Sheet

The 8 Building Blocks

Load Balancers

Caching

CDNs (Content Delivery Networks)

Databases

Message Queues

Microservices

API Design

Monitoring & Observability

Test Yourself

Related Topics