Backend Development Interview Questions

TL;DR

35+ backend development interview questions organized by topic. Click "Show Answer" to reveal answers. Perfect for a 15-minute revision before your next interview.

Short on time? Focus on HTTP & APIs and Databases — they come up in 80% of backend interviews.

HTTP & Web Fundamentals

Q: What happens when you type a URL in the browser?

Think of it like sending a letter through a postal system with multiple steps. First, your browser checks its DNS cache to translate the domain name into an IP address. If it's not cached, a DNS resolver queries root servers, TLD servers, and authoritative servers to find the IP. Next, your browser opens a TCP connection (three-way handshake: SYN, SYN-ACK, ACK). If HTTPS, a TLS handshake happens to establish encryption. Then an HTTP request is sent to the server. The server processes it (hits the backend, queries databases) and sends back an HTTP response with HTML. Finally, the browser parses HTML, fetches CSS/JS/images, builds the DOM, and renders the page.

Q: Explain the difference between HTTP and HTTPS.

HTTP sends data in plain text — anyone intercepting the traffic can read it. HTTPS wraps HTTP inside a TLS (Transport Layer Security) layer that encrypts all communication between client and server. HTTPS uses port 443 (vs HTTP's port 80), requires an SSL/TLS certificate, and provides three guarantees: confidentiality (data is encrypted), integrity (data can't be tampered with in transit), and authentication (you're talking to the real server, not an impersonator). In 2026, there's no reason not to use HTTPS — certificates are free via Let's Encrypt.

Q: What are HTTP status codes? Name 5 important ones.

Status codes are three-digit numbers the server sends back to tell the client what happened. They're grouped by category:

  • 200 OK — Request succeeded, here's the data.
  • 201 Created — Resource was successfully created (common after POST).
  • 301 Moved Permanently — Resource moved to a new URL forever (browsers cache this).
  • 404 Not Found — The server can't find the requested resource.
  • 500 Internal Server Error — Something broke on the server (your code threw an unhandled exception).

The ranges: 2xx = success, 3xx = redirect, 4xx = client error, 5xx = server error. Other important ones: 401 (unauthenticated), 403 (forbidden), 429 (rate limited).

Q: What is the difference between GET and POST?

GET retrieves data and should be idempotent (calling it 10 times gives the same result). Parameters go in the URL query string, so they're visible, bookmarkable, and limited in length. POST sends data to the server to create or process something. The data goes in the request body (not the URL), can be any size, and is not idempotent — submitting a form twice might create two records. A good rule of thumb: GET for reading, POST for writing. GET requests can be cached by browsers and CDNs; POST requests cannot.

Q: What are HTTP headers and why do they matter?

HTTP headers are key-value metadata sent with every request and response. Think of them as the "envelope" around your data. They control caching (Cache-Control, ETag), content type (Content-Type: application/json), authentication (Authorization: Bearer <token>), security (Content-Security-Policy, X-Frame-Options), and more. Without headers, the server wouldn't know what format the client wants, whether the client is authenticated, or how long to cache the response. Custom headers (prefixed X- by convention) let backends pass metadata like request IDs for distributed tracing.

Q: Explain the difference between stateful and stateless protocols.

A stateless protocol treats each request independently — the server doesn't remember anything about previous requests. HTTP is stateless: every request must carry all the information the server needs (auth token, session ID, etc.). A stateful protocol maintains context between requests — TCP is stateful because it tracks sequence numbers and connection state. Statelessness is a huge advantage for backend scalability because any server in your cluster can handle any request. You don't need sticky sessions or shared memory. The tradeoff: you need external state stores (databases, Redis) to persist user sessions.

Want to go deeper? See our Backend Development Overview for the big picture of how HTTP fits into backend architecture.

APIs & REST

Q: What is a REST API? What makes it RESTful?

REST (Representational State Transfer) is an architectural style, not a protocol. An API is RESTful when it follows these constraints: (1) Client-server separation — frontend and backend are independent. (2) Stateless — each request contains all needed info. (3) Uniform interface — resources identified by URLs, manipulated via standard HTTP methods. (4) Cacheable — responses declare whether they can be cached. (5) Layered system — client doesn't know if it's talking to the end server or a proxy. In practice, "RESTful" often means: use nouns for URLs (/users, not /getUsers), HTTP verbs for actions (GET, POST, PUT, DELETE), and return proper status codes.

Q: Explain CRUD operations in the context of REST.

CRUD maps directly to HTTP methods and database operations:

# Create → POST
POST /api/users        → INSERT INTO users

# Read → GET
GET /api/users         → SELECT * FROM users
GET /api/users/42      → SELECT * FROM users WHERE id = 42

# Update → PUT (full) / PATCH (partial)
PUT /api/users/42      → UPDATE users SET ... WHERE id = 42

# Delete → DELETE
DELETE /api/users/42   → DELETE FROM users WHERE id = 42

This mapping is the backbone of REST. POST creates a new resource and returns 201. GET reads without side effects. PUT replaces the entire resource. PATCH updates only the fields you send. DELETE removes the resource and typically returns 204 (No Content).

Q: What is the difference between PUT and PATCH?

PUT replaces the entire resource. If you PUT a user object with only the name field, all other fields get wiped or set to defaults. It's idempotent — sending the same PUT 10 times gives the same result. PATCH updates only the fields you send. Send {"name": "Jane"} via PATCH and only the name changes; everything else stays. In practice, most "update" endpoints are PATCH, not PUT, because you rarely want to replace the whole object. A common mistake: using PUT when you mean PATCH, accidentally wiping out fields the client didn't send.

Q: How would you design pagination for a REST API?

Two main approaches. Offset-based: use ?page=2&limit=20 or ?offset=20&limit=20. Simple to implement but has a flaw — if items are inserted/deleted between pages, you skip or duplicate items. Cursor-based: use ?cursor=abc123&limit=20 where the cursor is an opaque token (usually an encoded ID or timestamp). More robust for real-time data and doesn't degrade at high offsets.

// Response with cursor-based pagination
{
  "data": [...],
  "pagination": {
    "next_cursor": "eyJpZCI6MTAwfQ==",
    "has_more": true
  }
}

Always include total count (for offset) or has_more (for cursor) so the client knows when to stop.

Q: What is API versioning and what strategies exist?

API versioning lets you evolve your API without breaking existing clients. Three common strategies: (1) URL path: /api/v1/users, /api/v2/users — most popular, easy to understand, easy to route. (2) Query parameter: /api/users?version=2 — less common, harder to cache. (3) Header-based: Accept: application/vnd.myapi.v2+json — keeps URLs clean but harder to test in a browser. URL versioning wins for simplicity. The key rule: never break existing versions. Add new fields freely (additive changes are safe), but removing or renaming fields requires a new version.

Q: How do you handle errors in a REST API?

Use proper HTTP status codes (don't return 200 with an error message in the body). Return a consistent error response format:

// Consistent error response
{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Email is required",
    "details": [
      { "field": "email", "issue": "must not be empty" }
    ]
  }
}

Map errors to codes: 400 for validation errors, 401 for missing auth, 403 for insufficient permissions, 404 for not found, 409 for conflicts, 422 for unprocessable entities, 429 for rate limiting, 500 for server bugs. Never expose stack traces or internal details in production — log them server-side with a request ID that the client can reference for support.

Want to go deeper? See our APIs & REST deep dive for endpoint design patterns, authentication flows, and real-world examples.

Databases

Q: What is the difference between SQL and NoSQL databases?

SQL databases (PostgreSQL, MySQL) are relational — data lives in tables with fixed schemas, related via foreign keys, and queries use SQL. They guarantee ACID transactions, which means your data is always consistent. NoSQL databases come in several flavors: document stores (MongoDB), key-value (Redis), column-family (Cassandra), and graph (Neo4j). They typically have flexible schemas, scale horizontally more easily, and trade strict consistency for performance (eventual consistency). Choose SQL when you need complex queries, joins, and strict data integrity. Choose NoSQL when you need flexible schemas, massive write throughput, or your data is naturally hierarchical (like JSON documents).

Q: Explain ACID properties with an example.

ACID guarantees reliable transactions. Imagine transferring $100 from Account A to Account B:

  • Atomicity — Both the debit and credit happen, or neither does. If the server crashes after debiting A but before crediting B, the whole transaction rolls back.
  • Consistency — The database moves from one valid state to another. Total money before = total money after. No rules are broken.
  • Isolation — Two concurrent transfers don't interfere. If someone checks Account A's balance mid-transfer, they see either the old or new value, not a half-finished state.
  • Durability — Once the transfer commits, it survives power outages and crashes. It's written to disk, not just in memory.
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT;

Q: What is database indexing and when would you use it?

An index is like a book's table of contents — instead of scanning every page (full table scan), you jump directly to the right page. Under the hood, most databases use B-tree indexes that give O(log n) lookup time instead of O(n). You should index columns that appear in WHERE clauses, JOIN conditions, and ORDER BY. But indexes aren't free: they slow down writes (INSERT/UPDATE/DELETE) because the index must also be updated. They also consume disk space. Don't index everything — index selectively based on your query patterns. A good starting point: index foreign keys, columns you filter on frequently, and columns in unique constraints.

Q: Explain the N+1 query problem and how to solve it.

The N+1 problem happens when your code executes 1 query to get a list of N items, then N additional queries to get related data for each item. For example:

# N+1 Problem: 1 query + N queries
posts = db.query("SELECT * FROM posts")  # 1 query
for post in posts:
    author = db.query(
        "SELECT * FROM users WHERE id = %s", post.author_id
    )  # N queries!

If there are 100 posts, that's 101 database queries. The fix is eager loading — fetch everything in 1-2 queries using JOINs or WHERE IN:

# Fixed: 2 queries total
posts = db.query("SELECT * FROM posts")
author_ids = [p.author_id for p in posts]
authors = db.query(
    "SELECT * FROM users WHERE id IN %s", author_ids
)  # 1 extra query regardless of N

ORMs like SQLAlchemy, Django ORM, and Prisma have built-in eager loading (select_related, include) that solve this automatically.

Q: What is database normalization? Explain 1NF, 2NF, 3NF.

Normalization is the process of organizing tables to reduce redundancy and prevent update anomalies. Think of it as "don't repeat yourself" for databases.

  • 1NF (First Normal Form) — Every cell holds a single atomic value. No arrays, no comma-separated lists. Each row is unique (has a primary key).
  • 2NF (Second Normal Form) — 1NF plus no partial dependencies. Every non-key column depends on the entire primary key, not just part of it. This matters for composite keys.
  • 3NF (Third Normal Form) — 2NF plus no transitive dependencies. Non-key columns don't depend on other non-key columns. If you have zip_code and city, city depends on zip_code (not the primary key), so you'd split it into a separate table.

In practice, most production databases are in 3NF. Sometimes you denormalize intentionally for read performance (storing computed values, duplicating data), accepting the tradeoff of more complex writes.

Q: When would you choose MongoDB over PostgreSQL?

Choose MongoDB when: your data is naturally document-shaped (nested objects, varying fields per record), you need flexible schemas that evolve rapidly (early-stage prototyping), you need to store large amounts of semi-structured data (logs, events, CMS content), or you need easy horizontal scaling with built-in sharding. Choose PostgreSQL when: your data has strong relationships (orders → items → products), you need complex queries with JOINs and aggregations, you need ACID transactions across multiple tables, or you want a single database that also handles JSON (jsonb), full-text search, and geospatial queries. A common pattern: PostgreSQL as the primary database, MongoDB for specific use cases like content management or event logs.

Want to go deeper? See our Databases & ORMs deep dive for ORM patterns, migrations, and query optimization strategies.

Authentication & Security

Q: What is the difference between authentication and authorization?

Authentication (AuthN) answers "Who are you?" — it verifies identity. You prove who you are with credentials: password, fingerprint, OAuth token. Authorization (AuthZ) answers "What can you do?" — it determines permissions. After you've proven your identity, the system checks whether you're allowed to access a resource. Example: logging into GitHub is authentication. Whether you can push to a specific repository is authorization. Authentication always comes first. A common pattern is RBAC (Role-Based Access Control) where roles (admin, editor, viewer) map to permissions.

Q: How do JWT tokens work? What are their pros/cons?

A JWT (JSON Web Token) has three parts separated by dots: header.payload.signature. The header specifies the algorithm (e.g., HS256). The payload contains claims — user ID, roles, expiration time. The signature is a hash of header + payload using a secret key, proving the token hasn't been tampered with.

// JWT payload (decoded)
{
  "sub": "user_123",
  "role": "admin",
  "exp": 1717200000,
  "iat": 1717113600
}

Pros: Stateless (no server-side session storage), scalable across multiple servers, can contain user info (no database lookup per request), works well for microservices. Cons: Can't be revoked easily (you must wait for expiry or maintain a blacklist), payload is base64-encoded not encrypted (don't store secrets in it), tokens can grow large if you stuff too much data in them. Best practice: short-lived access tokens (15 min) + long-lived refresh tokens stored securely.

Q: Explain the OAuth 2.0 Authorization Code flow.

This is the most secure OAuth flow, used for server-side apps (like "Login with Google"). Here are the steps:

  1. User clicks "Login with Google" on your app.
  2. Your app redirects the user to Google's authorization server with your client_id, redirect_uri, requested scope, and a random state parameter (CSRF protection).
  3. User logs in to Google and approves the requested permissions.
  4. Google redirects back to your redirect_uri with an authorization code (short-lived, single-use).
  5. Your backend server exchanges that code for an access token (and optionally a refresh token) by calling Google's token endpoint with the code + your client_secret.
  6. Your server uses the access token to call Google's API and get the user's profile.

The key security feature: the access token never touches the browser. The authorization code alone is useless without the client secret.

Q: How do you prevent SQL injection?

SQL injection happens when user input gets concatenated directly into a SQL query:

# VULNERABLE - never do this!
query = f"SELECT * FROM users WHERE name = '{user_input}'"
# If user_input = "'; DROP TABLE users; --" ... game over

The fix is parameterized queries (prepared statements), which separate SQL logic from data:

# SAFE - parameterized query
cursor.execute(
    "SELECT * FROM users WHERE name = %s",
    (user_input,)
)

The database treats the parameter as data, never as executable SQL. Additional defenses: use an ORM (which parameterizes by default), apply the principle of least privilege (your app's DB user shouldn't have DROP TABLE permission), validate and sanitize input, and use a Web Application Firewall (WAF) as an extra layer.

Q: What is CORS and why does it exist?

CORS (Cross-Origin Resource Sharing) is a browser security mechanism that controls which websites can make requests to your API. By default, browsers block requests from one origin (e.g., https://myapp.com) to a different origin (e.g., https://api.myapp.com) — this is the Same-Origin Policy. CORS relaxes this by letting the server declare which origins are allowed via response headers:

Access-Control-Allow-Origin: https://myapp.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Access-Control-Allow-Headers: Content-Type, Authorization

For non-simple requests (PUT, DELETE, or custom headers), the browser sends a preflight OPTIONS request first to check permissions. CORS only applies to browsers — server-to-server requests aren't affected. A common mistake: setting Access-Control-Allow-Origin: * in production, which lets any website call your API.

Q: How should passwords be stored in a database?

Never store plain-text passwords. Use a slow, salted hashing algorithm: bcrypt, scrypt, or Argon2 (the current best choice). Here's the process: (1) Generate a random salt (unique per user). (2) Hash the password with the salt using a purposefully slow algorithm. (3) Store the hash + salt (bcrypt embeds the salt in the hash string automatically).

# Python example with bcrypt
import bcrypt

# Registration: hash the password
password = b"user_password"
salt = bcrypt.gensalt(rounds=12)
hashed = bcrypt.hashpw(password, salt)

# Login: verify the password
if bcrypt.checkpw(password, hashed):
    print("Login successful")

Why slow hashing? If your database leaks, attackers can try billions of SHA-256 hashes per second, but only a few thousand bcrypt hashes per second. The salt prevents rainbow table attacks (precomputed hash lookups). Never use MD5 or SHA-256 alone for passwords — they're too fast.

Want to go deeper? See our Authentication & Security deep dive for JWT implementation, OAuth flows, and security best practices.

Server Architecture & Patterns

Q: What is the MVC pattern?

MVC (Model-View-Controller) separates your application into three concerns: Model handles data and business logic (database queries, validations, calculations). View handles presentation (HTML templates, JSON responses). Controller handles the request/response flow — it receives a request, calls the model for data, and returns the view. This separation makes code easier to test, maintain, and scale because changing the UI doesn't touch business logic and vice versa. Most web frameworks use a variation: Django (MTV — Model-Template-View), Express (route handlers + models), Spring (Controller-Service-Repository).

Q: Explain middleware in the context of a web framework.

Middleware is code that runs between receiving a request and sending a response. Think of it as a pipeline of functions that each get a chance to process, modify, or reject the request before it reaches your route handler.

// Express.js middleware example
app.use((req, res, next) => {
  console.log(`${req.method} ${req.url}`);
  next(); // pass to next middleware
});

app.use(authMiddleware);    // check JWT token
app.use(rateLimiter);       // block if too many requests
app.use(cors());            // handle CORS headers
app.get('/api/users', handler); // finally hits your code

Common middleware: logging, authentication, rate limiting, CORS, body parsing, error handling, compression. The order matters — auth middleware should run before route handlers, and error-handling middleware should run last. Middleware can short-circuit the pipeline (e.g., return 401 without calling next()).

Q: What are microservices? When should you use them vs monolith?

A monolith is a single deployable unit containing all your code. A microservices architecture splits your system into small, independent services (user service, payment service, notification service), each with its own database and deployment pipeline, communicating via APIs or message queues.

Start with a monolith when: you're a small team (under 10 devs), building a new product, or don't yet know your domain boundaries. Monoliths are simpler to develop, test, deploy, and debug.

Move to microservices when: your monolith is too large for one team to work on without stepping on each other, you need independent scaling (search service needs 10x more resources than user service), or you need different tech stacks for different problems. The cost of microservices is significant: distributed systems complexity, network latency, eventual consistency, and operational overhead (you need service discovery, circuit breakers, distributed tracing). Don't start with microservices — earn the complexity.

Q: What is a message queue and when would you use one?

A message queue (RabbitMQ, Amazon SQS, Apache Kafka) is a buffer that sits between a producer and a consumer. The producer sends a message, the queue stores it, and the consumer processes it asynchronously. Think of it like a to-do list between two coworkers.

Use cases: (1) Decoupling — the email service doesn't need to be up when a user registers; the message waits in the queue. (2) Load leveling — if you get a spike of 10,000 requests, the queue absorbs them and workers process at their own pace. (3) Async processing — image resizing, PDF generation, sending notifications — things the user shouldn't wait for. (4) Microservice communication — services publish events ("order.created") and other services subscribe. The tradeoff: added complexity and eventual consistency. A message might be processed seconds or minutes later, so it's not suitable for real-time responses.

Q: Explain the difference between horizontal and vertical scaling.

Vertical scaling (scale up) means adding more power to your existing server — more CPU, RAM, faster SSD. It's simple (no code changes) but has a hard ceiling (you can only buy so big a machine) and a single point of failure. Horizontal scaling (scale out) means adding more servers behind a load balancer. It's theoretically unlimited, gives you redundancy (one server dies, others handle traffic), but requires your application to be stateless (no in-memory sessions) because any server might handle any request.

In practice, you do both: vertically scale until it gets expensive, then horizontally scale. Databases are harder to scale horizontally (you need read replicas, sharding, or distributed databases) while stateless API servers are easy to scale horizontally (just add more instances behind a load balancer).

Want to go deeper? See our Backend Development Overview for the full architecture picture.

Caching & Performance

Q: What is caching and where can you add it in a backend?

Caching stores frequently accessed data in a faster storage layer so you don't recompute or re-fetch it every time. Think of it as keeping sticky notes on your desk instead of walking to the filing cabinet. In a backend, you can add caching at multiple layers:

  • Browser cache — HTTP headers tell browsers to cache static assets and API responses.
  • CDN cache — Cloudflare, CloudFront cache responses at edge locations close to users.
  • Application cache — In-memory stores like Redis or Memcached cache database query results, computed values, or session data.
  • Database cache — Query cache, buffer pool, materialized views.

The closer the cache is to the user, the faster the response. But each layer adds complexity around invalidation — you need to decide when cached data is stale.

Q: What are cache invalidation strategies?

Cache invalidation is famously one of the two hardest problems in computer science. The main strategies:

  • TTL (Time-to-Live) — Cache entries expire after a set time (e.g., 5 minutes). Simple but data can be stale until expiry. Good for data that doesn't change often.
  • Write-through — Every write updates both the cache and the database simultaneously. Data is always fresh but writes are slower (two operations).
  • Write-behind (write-back) — Writes go to cache first, then asynchronously to the database. Fast writes but risk data loss if the cache crashes before syncing.
  • Cache-aside (lazy loading) — The app checks the cache first. On miss, it reads from the database, stores in cache, then returns. Most common pattern. You invalidate by deleting the cache key on writes.

A common approach: cache-aside with TTL as a safety net. Delete the cache key when data changes, and even if you miss a deletion, the TTL ensures stale data eventually expires.

Q: How does a CDN work?

A CDN (Content Delivery Network) is a distributed network of servers around the world. When a user requests a resource, the CDN serves it from the server geographically closest to them (called an edge node or PoP — Point of Presence). If that edge node doesn't have the content cached, it fetches it from your origin server, caches it, and then serves future requests from its cache.

CDNs are great for static assets (images, CSS, JS), but modern CDNs (Cloudflare, Fastly) can also cache API responses and even run serverless functions at the edge. Benefits: lower latency (50ms vs 500ms), reduced load on your origin server, DDoS protection (CDN absorbs the traffic), and automatic SSL. The tradeoff: cache invalidation is slower (propagating changes to 200+ edge nodes takes time), so CDNs work best for content that doesn't change every second.

Q: What is a reverse proxy and how is it different from a load balancer?

A reverse proxy sits in front of your servers and forwards client requests to them. Clients talk to the proxy, not your servers directly. It can do: SSL termination, caching, compression, rate limiting, and request routing. Nginx and Caddy are popular reverse proxies.

A load balancer is a specific type of reverse proxy that distributes traffic across multiple backend servers using algorithms like round-robin, least connections, or IP hash. Its primary job is distributing load, not caching or security.

In practice, the line is blurry. Nginx can act as both a reverse proxy and a load balancer. Cloud load balancers (AWS ALB, GCP Cloud Load Balancer) combine both roles plus health checking (automatically removing unhealthy servers from the pool). Think of it this way: all load balancers are reverse proxies, but not all reverse proxies are load balancers.

Want to go deeper? Caching and performance tie directly into System Design — a must-know for senior backend interviews.

Scenario-Based Questions

Q: Design a URL shortener (like bit.ly). What are the key components?

A URL shortener has a few core pieces. API layer: POST /shorten accepts a long URL and returns a short code. GET /:code redirects to the original URL. ID generation: Convert an auto-incrementing ID to a base62 string (a-z, A-Z, 0-9). ID 12345 becomes "dnh". Alternatively, use a hash (MD5/SHA256) of the URL and take the first 6-7 characters. Database: A simple table mapping short_codeoriginal_url with an index on short_code.

CREATE TABLE urls (
  id BIGSERIAL PRIMARY KEY,
  short_code VARCHAR(10) UNIQUE NOT NULL,
  original_url TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW(),
  click_count INT DEFAULT 0
);

Scaling considerations: Redis cache for hot URLs (the same popular links get clicked millions of times). Read-heavy workload, so read replicas help. For truly massive scale, pre-generate short codes in batches to avoid ID generation bottlenecks. Analytics (click tracking) should be async via a message queue to not slow down redirects.

Q: Your API is responding slowly. Walk me through how you'd debug it.

Debugging slow APIs is a systematic process — start broad, then narrow down:

  1. Reproduce it: Is it all endpoints or one? All users or specific ones? Consistent or intermittent?
  2. Check metrics: Look at your monitoring dashboard (response times, error rates, CPU, memory, disk I/O). If CPU is at 100%, it's a compute bottleneck. If memory is full, you might be swapping.
  3. Check the database: 90% of slow APIs are slow database queries. Run EXPLAIN ANALYZE on the slow query. Look for missing indexes, N+1 queries, or full table scans. Check connection pool exhaustion.
  4. Check external calls: Is your API waiting on a third-party service (payment gateway, email API)? Add timeouts and circuit breakers.
  5. Check for resource contention: Are there long-running transactions holding locks? Is the connection pool too small?
  6. Add tracing: Distributed tracing (OpenTelemetry, Jaeger) shows you exactly where time is spent across services.

Quick wins: add database indexes, implement caching for hot paths, move heavy work to background jobs, and increase connection pool size.

Q: How would you design a rate-limiting system?

Rate limiting protects your API from abuse and ensures fair usage. Common algorithms:

  • Fixed window: Allow 100 requests per minute. Reset the counter every minute. Simple but has a burst problem at window boundaries (200 requests in 2 seconds across two windows).
  • Sliding window: Tracks requests in a rolling time window. More accurate but slightly more complex.
  • Token bucket: A bucket holds tokens (say 100). Each request consumes a token. Tokens refill at a steady rate (e.g., 10/second). Allows short bursts while enforcing an average rate.
  • Leaky bucket: Requests enter a queue that processes at a fixed rate. Excess requests are dropped. Smooths traffic perfectly.
# Token bucket with Redis (pseudocode)
def is_allowed(user_id, limit=100, window=60):
    key = f"rate:{user_id}"
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, window)
    return current <= limit

Implementation: use Redis for distributed rate limiting (all your servers share one counter). Return 429 Too Many Requests with a Retry-After header. Identify users by API key, IP address, or user ID. Apply different limits to different endpoints (login should have stricter limits than reading public data).

Q: You need to send 100,000 emails to users. How would you architect this?

Never send 100K emails synchronously from your API server — it would take hours and block your app. Here's the architecture:

  1. Message queue: Your API enqueues a "send-campaign" job to a queue (RabbitMQ, SQS, or Redis). This returns immediately to the user.
  2. Worker pool: Multiple worker processes consume from the queue. Each worker picks up a batch (say 100 users), compiles the email template, and sends via an email service (SendGrid, Amazon SES).
  3. Batch processing: Don't send one-at-a-time. Batch API calls to your email provider (most support up to 1000 recipients per API call).
  4. Rate limiting: Email providers have sending limits (e.g., SES allows 50 emails/second initially). Throttle your workers accordingly.
  5. Tracking: Store send status (queued, sent, delivered, bounced, opened) in a database. Use webhooks from the email provider for delivery events.
  6. Retry logic: Temporary failures (network timeout, provider overload) should be retried with exponential backoff. Permanent failures (invalid email) should be marked and skipped.

With this architecture, 100K emails can be sent in minutes, your API stays responsive, and you have full visibility into delivery status.

Scenario questions test your ability to think about systems holistically. For more, explore System Design fundamentals.