What is System Design?

TL;DR

System Design is the process of defining the architecture, components, and data flow of a system to meet specific requirements. It covers scalability, reliability, availability, and performance — the skills tested in senior engineering interviews.

The Big Picture

Every production system follows this pattern: clients talk to servers, servers talk to databases, and everything in between handles scale, speed, and failure.

System Design: client → load balancer → app servers → cache → database, with CDN and message queue
Explain Like I'm 12

Imagine building a pizza delivery app. One store works fine for your neighborhood. But what if a million people order at once? You need more stores (scaling), a way to send orders to the nearest store (load balancing), a backup plan if a store's oven breaks (reliability), and a way to remember everyone's favorite order (caching). System Design is figuring out all of this BEFORE you build it.

What is System Design?

System Design is the art of building systems that work at scale. It's not just writing code — it's deciding: Where does data live? How do services talk to each other? What happens when things fail?

There are two types:

  • High-Level Design (HLD) — Architecture and components. Which services exist, how they communicate, where data is stored. This is what you draw on a whiteboard.
  • Low-Level Design (LLD) — Classes, interfaces, algorithms, and data models. The internal structure of individual components.

In interviews and in practice, system design is about making trade-offs. There's no single "correct" answer — only answers that fit the requirements, constraints, and scale of your specific problem.

Why Does It Matter?

Every real application is a distributed system. Here's why you can't ignore design:

  • Single servers fail — Hardware dies, processes crash, networks partition. Your system must keep running.
  • Traffic spikes happen — Black Friday, viral posts, breaking news. If you can't scale, you go down.
  • Data grows — What works for 1,000 users breaks at 1,000,000. Storage, indexing, and query patterns all need to evolve.
  • Users expect 99.99% uptime — That's less than 53 minutes of downtime per year. You need redundancy, failover, and graceful degradation.
  • It's the #1 interview topic for senior roles — At FAANG, Big Tech, and any company hiring L5+ engineers, system design rounds are the deciding factor.

The Core Building Blocks

These eight components appear in nearly every system design. Think of them as your toolkit:

🔄
Load Balancers
Distribute traffic across servers so no single machine is overwhelmed
💾
Caching
Store frequently accessed data in memory for sub-millisecond reads
🌐
CDNs
Serve static content from edge locations close to users worldwide
🗄
Databases
SQL, NoSQL, key-value, graph — choose based on your data model
📨
Message Queues
Decouple services with async communication (Kafka, RabbitMQ, SQS)
🔲
Microservices
Split monoliths into independent, deployable services by domain
🔌
APIs
REST, GraphQL, gRPC — define how services talk to each other
📊
Monitoring
Metrics, logs, traces, and alerts — you can't fix what you can't see

The System Design Interview Framework

Every system design interview follows the same 6-step structure. Memorize this framework and you'll never freeze on a whiteboard:

  1. Step 1: Clarify Requirements — Ask questions. What are the functional requirements (what does the system do?) and non-functional requirements (latency, availability, consistency)?
  2. Step 2: Estimate Scale — How many users? Requests per second? Data size? These numbers drive every design decision.
  3. Step 3: Define API/Interfaces — What endpoints does the system expose? What are the inputs and outputs?
  4. Step 4: High-Level Design — Draw the boxes and arrows. Clients, load balancers, services, databases, caches, queues.
  5. Step 5: Deep Dive into Bottlenecks — Pick the hardest part and go deep. How do you shard the database? How do you handle hot keys in the cache?
  6. Step 6: Discuss Trade-offs — Every choice has pros and cons. Show you understand the trade-offs (consistency vs. availability, latency vs. throughput, cost vs. performance).

What You'll Learn

This topic walks you through system design from fundamentals to real-world architectures:

Start Learning: Core Concepts →

Test Yourself

What's the difference between horizontal and vertical scaling?

Vertical scaling (scaling up) means adding more power to a single machine — more CPU, RAM, or disk. Horizontal scaling (scaling out) means adding more machines to your pool. Vertical scaling has a ceiling (you can't infinitely upgrade one server) and a single point of failure. Horizontal scaling is how modern systems handle massive traffic — add more servers behind a load balancer.

Name 4 core building blocks of system design.

Any four of: Load Balancers (distribute traffic), Caching (fast reads from memory), CDNs (edge content delivery), Databases (persistent storage), Message Queues (async communication), Microservices (independent services), APIs (service interfaces), and Monitoring (observability).

Why can't you just use a single powerful server for everything?

Three reasons: (1) Single point of failure — if that server crashes, your entire system goes down. (2) Scaling ceiling — there's a physical limit to how much CPU/RAM you can add to one machine. (3) Geographic latency — users far from the server experience slow response times. Distributed systems solve all three by spreading load across multiple machines in multiple locations.

What are the 6 steps of a system design interview?

The 6 steps are: (1) Clarify requirements — functional and non-functional. (2) Estimate scale — users, QPS, data size. (3) Define API/interfaces — endpoints and contracts. (4) High-level design — draw the architecture diagram. (5) Deep dive into bottlenecks — solve the hardest problems. (6) Discuss trade-offs — justify every decision with pros and cons.