Webhooks, Queues & Caching

Q: Q1: What's the difference between a webhook and polling? When would you use each?

Polling means your server repeatedly asks an external service "anything new?" on a timer (e.g., every 30 seconds). It's simple but wastes resources when nothing has changed. Webhooks flip the model: the external service sends an HTTP POST to your server the moment something happens. Use webhooks when the external service supports them (Stripe, GitHub, Slack) for near-real-time updates with minimal resource usage. Use polling when the external service doesn't support webhooks, or when you need to pull data on your own schedule (e.g., scraping or batch imports).

Q: Q2: When should you introduce a message queue into your architecture?

Introduce a message queue when you need: (1) async processing — slow tasks like email, PDF generation, or image processing that shouldn't block the API response, (2) service decoupling — services that shouldn't depend directly on each other, (3) spike absorption — a buffer to handle sudden traffic bursts without overloading downstream services, or (4) reliable delivery — guaranteed processing with retries and dead letter queues. Don't add a queue just because it's trendy — it adds operational complexity. For a simple app with low traffic, synchronous processing is fine.

Q: Q3: Explain the cache-aside pattern step by step.

The cache-aside (or lazy loading) pattern works in three steps: (1) The application checks the cache (e.g., Redis) for the requested data. (2) On a cache hit, return the cached data immediately. On a cache miss, query the database for the data. (3) After fetching from the database, store the result in the cache with a TTL (e.g., 5 minutes) so the next request gets a cache hit. On writes, update the database first, then invalidate (delete) the cache entry so the next read fetches fresh data. This pattern is the most common because it only caches data that's actually requested, avoiding wasted memory on unused entries.

Q: Q4: What is a dead letter queue and why is it important?

A dead letter queue (DLQ) is a separate queue where messages go after they've failed processing a maximum number of times. Without a DLQ, a "poison message" (one that always causes the consumer to crash) would be retried forever, blocking all other messages in the queue. With a DLQ, after 3–5 retry attempts, the message moves to the DLQ where it can be inspected, debugged, and reprocessed manually. DLQs are essential for production systems because they prevent one bad message from taking down your entire processing pipeline while preserving the failed message for later analysis.

Q: Q5: How should you handle webhook failures and retries?

Three critical practices: (1) Return 200 immediately — acknowledge receipt before processing. If your processing takes too long, the webhook provider times out and retries, causing duplicate deliveries. (2) Process asynchronously — push the webhook payload onto a message queue and process it in a background worker. This decouples receipt from processing. (3) Implement idempotency — store processed event IDs in your database. Before processing a webhook, check if you've already handled that event ID. This prevents double-processing when retries happen. Also verify the webhook signature (HMAC-SHA256) to ensure it actually came from the claimed provider, not a malicious actor.

Q: Q1: Design a webhook system that can handle 10,000 events per second.

Architecture: Use a thin webhook receiver behind a load balancer. The receiver does only two things: (1) verify the signature and (2) push the raw payload onto a message queue (Kafka or SQS). Return 200 immediately. This keeps the receiver fast (~1ms per request). Processing layer: A fleet of consumer workers pulls messages from the queue and processes them. Scale consumers horizontally based on queue depth. Use consumer groups (Kafka) or competing consumers (RabbitMQ) for parallel processing. Reliability: Kafka persists messages to disk with configurable retention (e.g., 7 days). If consumers go down, messages wait in the queue. Dead letter queues catch poison messages. Every consumer is idempotent using event IDs stored in a database. Scaling: The receiver layer scales horizontally behind the load balancer. Kafka partitions allow parallel consumption. Add more partitions and consumers as throughput grows. Redis can cache deduplication checks for recently processed event IDs.

Q: Q2: How do you prevent processing the same webhook event twice?

Idempotency key: Every webhook event has a unique ID (e.g., Stripe's evt_1NqFbSD2eZvKYlo2C3LPaXnf). Store processed event IDs in a database table with a unique constraint. Before processing: Attempt to INSERT the event ID. If it already exists (unique constraint violation), skip processing — you've already handled this event. Atomic check-and-process: Use a database transaction: INSERT the event ID and process the event in the same transaction. If the insert fails (duplicate), the entire transaction rolls back and no work is done. Fast dedup layer: For high-throughput systems, add a Redis SET check before the database. SADD processed_events evt_id returns 0 if already in the set. This avoids hitting the database for obvious duplicates. Set a TTL on the Redis set entries (e.g., 48 hours) matching the webhook provider's retry window. Make operations idempotent by design: Use UPDATE ... SET status = 'paid' WHERE id = ? AND status = 'pending' instead of unconditional updates. The second execution of the same event has no effect because the WHERE clause no longer matches.

Q: Q3: What Redis cache invalidation strategy would you use for a social media feed?

Short TTL + event-based invalidation hybrid. Cache each user's feed in Redis with a 2–5 minute TTL. This ensures worst-case staleness is bounded. Write-through on new posts: When a user creates a post, push an invalidation event to a queue. Workers delete or update the cached feeds of that user's followers. For users with millions of followers (celebrities), use fanout-on-read instead — don't pre-compute their followers' feeds. Cache structure: Use a Redis Sorted Set (ZADD feed:user:123 timestamp post_id) with post IDs scored by timestamp. Trim to the latest 100 posts with ZREMRANGEBYRANK. Fetch the feed with ZREVRANGE and batch-load post content from a separate cache or database. Why not pure TTL: A 5-minute TTL means a user who just posted won't see their own post for up to 5 minutes. Event-based invalidation of the poster's own feed gives instant feedback while TTL handles everyone else efficiently.

Q: Q4: When would you choose Kafka over RabbitMQ, and vice versa?

Choose Kafka when: You need (1) very high throughput (100K+ messages/sec), (2) event replay — consumers can re-read past messages by resetting their offset, (3) event sourcing or log aggregation where you need a persistent, ordered event log, (4) multiple consumers need to independently read the same stream of events (consumer groups), or (5) real-time data pipelines feeding analytics, ML, or data warehouses. Choose RabbitMQ when: You need (1) traditional task queues where each message is processed by exactly one worker, (2) complex routing — RabbitMQ's exchange types (direct, topic, fanout, headers) offer flexible message routing, (3) lower operational complexity — RabbitMQ is simpler to set up and manage than a Kafka cluster, (4) request-reply (RPC) patterns, or (5) your throughput is moderate (under 50K msg/sec) and you don't need event replay. In short: Kafka is an event streaming platform (append-only log). RabbitMQ is a message broker (queue with routing). Kafka for "what happened" (event log). RabbitMQ for "do this thing" (task queue).

By QuickLearnPro Editorial · Editorial standards

TL;DR

Webhooks push real-time notifications to your server. Message queues (RabbitMQ, SQS, Kafka) decouple services for async processing. Caching (Redis, CDN, in-memory) stores frequently accessed data closer to the user. Together, these three patterns make backends fast, reliable, and scalable.

Explain Like I'm 12

Think of a restaurant. Webhooks are like the buzzer they give you when you order — instead of asking "is my food ready?" every 30 seconds, the buzzer just tells you when it's done (push notification). Queues are the ticket system chefs use to work through orders one by one without dropping any, even when 50 orders come in at once. Caching is keeping ketchup on the table instead of walking to the kitchen every time you want some. These three tricks let a restaurant serve hundreds of people without chaos.

The Big Picture

Webhooks, queues, and caching work together to build backends that are responsive, resilient, and fast. Webhooks bring data in, queues smooth out the processing, and caching speeds up the delivery.

Webhooks

A webhook is an HTTP callback — when something happens in an external service, it sends an HTTP POST request to a URL you've registered. Instead of your code constantly asking "did anything change?", the service tells you the moment something happens.

Push vs Poll

Approach	How It Works	Latency	Resource Usage	Complexity
Polling	Your server asks "anything new?" on a timer	Depends on interval (seconds to minutes)	High — constant requests even when nothing changed	Simple to implement
Webhooks (Push)	External service sends you data when events happen	Near real-time (milliseconds to seconds)	Low — only fires when there's actual data	Need a public endpoint + verification

How Webhooks Work

You register a callback URL with the external service (e.g., https://yourapp.com/webhooks/stripe).
When an event happens (payment completed, PR merged, message sent), the service creates a JSON payload describing the event.
The service sends an HTTP POST request to your URL with the payload.
Your server receives it, verifies the signature, and processes the event.
You return a 200 OK immediately. If you don't, the service will retry.

Webhook Payload Example

// Stripe payment webhook payload
{
  "id": "evt_1NqFbSD2eZvKYlo2C3LPaXnf",
  "type": "payment_intent.succeeded",
  "data": {
    "object": {
      "id": "pi_3NqFbSD2eZvKYlo20kL2LvZ",
      "amount": 2000,
      "currency": "usd",
      "status": "succeeded",
      "customer": "cus_9s6XKzkNRiz8i3",
      "metadata": {
        "order_id": "order_12345"
      }
    }
  },
  "created": 1694123456
}

// GitHub push event webhook payload
{
  "ref": "refs/heads/main",
  "repository": {
    "full_name": "user/repo",
    "html_url": "https://github.com/user/repo"
  },
  "pusher": {
    "name": "developer",
    "email": "[email protected]"
  },
  "commits": [
    {
      "id": "abc123",
      "message": "Fix login bug",
      "timestamp": "2026-04-04T10:30:00Z"
    }
  ]
}

Building a Webhook Receiver

// Express.js webhook receiver
const express = require('express');
const crypto = require('crypto');
const app = express();

// IMPORTANT: Use raw body for signature verification
app.post('/webhooks/stripe',
  express.raw({ type: 'application/json' }),
  (req, res) => {
    const signature = req.headers['stripe-signature'];
    const endpointSecret = process.env.STRIPE_WEBHOOK_SECRET;

    // 1. Verify the signature (HMAC-SHA256)
    let event;
    try {
      event = stripe.webhooks.constructEvent(
        req.body, signature, endpointSecret
      );
    } catch (err) {
      console.error('Signature verification failed:', err.message);
      return res.status(400).send('Invalid signature');
    }

    // 2. Return 200 immediately (process async later)
    res.status(200).json({ received: true });

    // 3. Process the event asynchronously
    processWebhookAsync(event).catch(err => {
      console.error('Webhook processing failed:', err);
    });
  }
);

async function processWebhookAsync(event) {
  switch (event.type) {
    case 'payment_intent.succeeded':
      await fulfillOrder(event.data.object.metadata.order_id);
      break;
    case 'payment_intent.payment_failed':
      await notifyCustomer(event.data.object.customer);
      break;
    default:
      console.log('Unhandled event type:', event.type);
  }
}

Verifying Webhook Signatures

Never trust an incoming webhook without verifying its signature. Anyone can send a POST request to your endpoint. Webhook providers sign payloads using HMAC-SHA256 with a shared secret.

# Manual HMAC-SHA256 verification (Python)
import hmac
import hashlib

def verify_webhook(payload_body, signature_header, secret):
    """Verify that a webhook came from the expected sender."""
    expected_signature = hmac.new(
        key=secret.encode('utf-8'),
        msg=payload_body,
        digestmod=hashlib.sha256
    ).hexdigest()

    # Use constant-time comparison to prevent timing attacks
    return hmac.compare_digest(
        f"sha256={expected_signature}",
        signature_header
    )

Idempotency: Handling Duplicate Deliveries

Webhook providers retry on failures, which means you may receive the same event multiple times. Your handler must be idempotent — processing the same event twice should have the same result as processing it once.

# Idempotency with a processed events table
async def handle_webhook(event):
    event_id = event['id']

    # Check if we already processed this event
    if await db.events.find_one({'event_id': event_id}):
        return  # Already processed, skip

    # Process the event
    await process_event(event)

    # Mark as processed
    await db.events.insert_one({
        'event_id': event_id,
        'processed_at': datetime.utcnow()
    })

Retry Logic and Failure Handling

Most webhook providers use exponential backoff for retries:

Attempt 1: Immediately
Attempt 2: After 1 minute
Attempt 3: After 5 minutes
Attempt 4: After 30 minutes
Attempt 5: After 2 hours
After all retries fail, the event is dropped or sent to a dead letter queue

Common Webhook Providers

Provider	Common Events	Signature Method
Stripe	Payment succeeded, refund issued, subscription updated	HMAC-SHA256 (Stripe-Signature header)
GitHub	Push, PR opened, issue created, release published	HMAC-SHA256 (X-Hub-Signature-256 header)
Twilio	SMS received, call completed, voicemail left	HMAC-SHA1 (X-Twilio-Signature header)
Slack	Message posted, reaction added, channel created	HMAC-SHA256 (X-Slack-Signature header)
Shopify	Order created, product updated, checkout completed	HMAC-SHA256 (X-Shopify-Hmac-SHA256 header)

Tip: Always return 200 OK immediately and process the webhook asynchronously (push it onto a queue). If your processing takes more than a few seconds, the webhook provider will time out and retry — leading to duplicate deliveries and wasted resources.

Message Queues

A message queue is a buffer that sits between producers (services that send messages) and consumers (services that process them). The producer drops a message into the queue and moves on. The consumer picks it up when it's ready. They don't need to be online at the same time — the queue holds messages until they're consumed.

Why Use Queues?

Decouple services — The API server doesn't need to know (or wait for) the email service. It just drops "send welcome email" onto the queue.
Handle traffic spikes — 10,000 orders in a flash sale? The queue absorbs the spike and workers process them at their own pace.
Retry failures — If a consumer crashes, the message goes back on the queue and another worker picks it up.
Async processing — Move slow work (PDF generation, image processing, API calls) out of the request-response cycle.

Comparison: Message Queue Technologies

Feature	RabbitMQ	Amazon SQS	Apache Kafka	Redis Pub/Sub
Type	Traditional message broker	Managed queue service	Distributed event streaming	In-memory pub/sub
Delivery	At-least-once, exactly-once possible	At-least-once (standard), exactly-once (FIFO)	At-least-once, exactly-once with transactions	At-most-once (fire and forget)
Persistence	Persists to disk	Persists (managed by AWS)	Persists to disk (retention-based)	No persistence (messages lost if no subscriber)
Throughput	~10K–50K msg/sec	~3K msg/sec (standard), scales with shards	~100K–1M msg/sec per partition	~500K msg/sec (in-memory, no persistence)
Best For	Task queues, routing, RPC	Serverless, simple queues, AWS-native apps	Event streaming, log aggregation, real-time pipelines	Real-time notifications, simple broadcasting
Complexity	Medium (self-hosted or managed)	Low (fully managed)	High (cluster management, partitions, consumer groups)	Low (if already using Redis)

Key Concepts

Concept	What It Means	Why It Matters
Acknowledgment (ACK)	Consumer tells the queue "I processed this message successfully"	Without ACK, the queue redelivers the message to another consumer
Dead Letter Queue (DLQ)	A separate queue where failed messages go after max retries	Prevents poison messages from blocking the main queue forever
At-least-once delivery	The queue guarantees delivery but may deliver duplicates	Your consumer must be idempotent (handle duplicates safely)
Exactly-once delivery	Each message is processed exactly once (harder to guarantee)	Required for financial transactions; adds overhead

Publishing and Consuming with Python

# RabbitMQ with pika (Python)
import pika
import json

# --- Publisher (sends messages) ---
connection = pika.BlockingConnection(
    pika.ConnectionParameters('localhost')
)
channel = connection.channel()

# Declare a durable queue (survives broker restart)
channel.queue_declare(queue='email_tasks', durable=True)

# Publish a message
message = {
    'to': '[email protected]',
    'subject': 'Welcome!',
    'template': 'welcome_email'
}
channel.basic_publish(
    exchange='',
    routing_key='email_tasks',
    body=json.dumps(message),
    properties=pika.BasicProperties(
        delivery_mode=2  # Make message persistent
    )
)
print("Message sent to queue")
connection.close()


# --- Consumer (processes messages) ---
connection = pika.BlockingConnection(
    pika.ConnectionParameters('localhost')
)
channel = connection.channel()
channel.queue_declare(queue='email_tasks', durable=True)

# Process only 1 message at a time (fair dispatch)
channel.basic_qos(prefetch_count=1)

def callback(ch, method, properties, body):
    task = json.loads(body)
    try:
        send_email(task['to'], task['subject'], task['template'])
        # Acknowledge: remove from queue
        ch.basic_ack(delivery_tag=method.delivery_tag)
        print(f"Email sent to {task['to']}")
    except Exception as e:
        # Reject and requeue for retry
        ch.basic_nack(
            delivery_tag=method.delivery_tag,
            requeue=True
        )
        print(f"Failed, requeued: {e}")

channel.basic_consume(
    queue='email_tasks',
    on_message_callback=callback
)
print("Waiting for messages...")
channel.start_consuming()

Common Use Cases

Email sending — Drop email tasks into a queue; a worker sends them without blocking the API response.
Image processing — User uploads a photo; a queue job resizes, crops, and generates thumbnails in the background.
Payment processing — Webhook arrives from Stripe; push it onto a queue for reliable, ordered processing.
Event sourcing — Every state change is an event on a Kafka topic; replay events to rebuild state.
Data pipelines — ETL jobs that extract, transform, and load data between systems.

Warning: Message queues add operational complexity — another service to monitor, debug, and maintain. Don't introduce a queue until you genuinely need async processing, service decoupling, or spike absorption. For a simple app with 100 users, just process everything synchronously in the request handler.

Background Jobs & Task Queues

A background job is any work that happens outside the HTTP request-response cycle. Instead of making the user wait while your server generates a PDF, resizes an image, or calls a slow third-party API, you push the work to a task queue and respond immediately.

Popular Task Queue Libraries

Library	Language	Broker	Best For
Celery	Python	Redis, RabbitMQ	Most Python web apps (Django, Flask, FastAPI)
Bull / BullMQ	Node.js	Redis	Node.js apps, job scheduling, rate limiting
Sidekiq	Ruby	Redis	Rails applications
Dramatiq	Python	Redis, RabbitMQ	Simpler Celery alternative with sane defaults

When to Move Work to the Background

Sending emails — SMTP calls take 1–5 seconds. Don't make the user wait.
Generating reports — A complex CSV or PDF export might take 30 seconds.
Image/video processing — Resizing, transcoding, and thumbnail generation are CPU-heavy.
Third-party API calls — External services can be slow or unreliable.
Data aggregation — Computing analytics, leaderboards, or recommendations.

Celery Task Example

# tasks.py - Define background tasks with Celery
from celery import Celery
from datetime import timedelta

app = Celery('tasks', broker='redis://localhost:6379/0')

# Configure retries and timeouts
app.conf.update(
    task_serializer='json',
    result_serializer='json',
    accept_content=['json'],
    task_time_limit=300,       # Hard kill after 5 minutes
    task_soft_time_limit=240,  # Raise exception after 4 minutes
)

@app.task(
    bind=True,
    max_retries=3,
    default_retry_delay=60,  # Wait 60s between retries
)
def send_welcome_email(self, user_id):
    """Send a welcome email to a new user."""
    try:
        user = get_user(user_id)
        email_service.send(
            to=user.email,
            subject="Welcome!",
            template="welcome",
            context={"name": user.name}
        )
    except EmailServiceError as exc:
        # Retry with exponential backoff
        raise self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))

@app.task(bind=True, max_retries=2)
def generate_report(self, report_id):
    """Generate a PDF report in the background."""
    try:
        report = build_report(report_id)
        upload_to_s3(report)
        notify_user(report_id, status="complete")
    except Exception as exc:
        notify_user(report_id, status="failed")
        raise self.retry(exc=exc)

# views.py - Dispatch tasks from your API handler
from tasks import send_welcome_email, generate_report

def register_user(request):
    user = create_user(request.data)

    # Fire and forget: push to background queue
    send_welcome_email.delay(user.id)

    # Return immediately - email sends in background
    return {"id": user.id, "message": "Account created!"}

def request_report(request):
    report = create_report_record(request.data)

    # Dispatch background job
    generate_report.delay(report.id)

    return {"report_id": report.id, "status": "processing"}

Monitoring Background Jobs

Flower (Celery) — Web dashboard showing active workers, task history, success/failure rates.
Bull Board (BullMQ) — UI to inspect queues, retry failed jobs, view job data.
Key metrics to track: queue depth (are jobs piling up?), processing time, failure rate, retry count.

Tip: Always set task_time_limit and max_retries on every background task. A task without a timeout can hang forever, blocking a worker. A task without a retry limit can loop infinitely if it keeps failing. Fail fast, retry smart, alert loud.

Caching

Caching stores copies of frequently accessed data in a faster storage layer so you don't have to recompute or re-fetch it every time. It's the single most impactful optimization for read-heavy applications.

Why Caching Matters: Latency by Storage Layer

Storage Layer	Typical Latency	Example
In-memory (app variable)	< 0.01ms	Python dict, Node.js Map, local variable
Redis / Memcached	0.1 – 1ms	Shared cache across app instances
Database query	5 – 50ms	PostgreSQL SELECT with index
Database (no index)	100 – 5,000ms	Full table scan on millions of rows
External API call	50 – 2,000ms	Third-party REST API over the internet

The math: If your endpoint gets 1,000 requests/second and each hits the database (10ms), that's 10 seconds of DB time per second. Cache the result in Redis (0.5ms) and you reduce DB load by 95% while cutting response time by 10x. Caching turns O(n) database load into O(1).

Cache Layers

Browser cache — The user's browser stores static assets (images, CSS, JS) locally. Controlled by Cache-Control and ETag headers.
CDN (Content Delivery Network) — Servers at the network edge (Cloudflare, CloudFront) cache static and even dynamic content close to users worldwide.
Reverse proxy (Nginx/Varnish) — Sits in front of your app server and caches full HTTP responses. Great for public pages that don't change often.
Application cache (Redis/Memcached) — Your code explicitly caches data (database results, API responses, computed values) in a shared in-memory store.
Database query cache — Some databases cache recent query results automatically. Useful but limited — invalidated on every write to the table.

Caching Strategies

Strategy	How It Works	Pros	Cons	Best For
Cache-Aside (Lazy)	App checks cache first. On miss, reads from DB and writes to cache.	Simple, only caches what's actually used	First request is always a cache miss	Most read-heavy workloads
Write-Through	App writes to cache and DB simultaneously on every write.	Cache is always up-to-date	Slower writes (two writes per operation)	Data that's read immediately after writing
Write-Behind (Write-Back)	App writes to cache only. Cache asynchronously writes to DB later.	Fastest writes	Risk of data loss if cache crashes before DB write	Write-heavy workloads where slight data loss is acceptable
Read-Through	Cache itself loads from DB on a miss (app only talks to cache).	App code is simpler — only talks to cache	Requires cache provider support	Frameworks that support built-in read-through

Redis as a Cache

# Redis CLI basics
redis-cli

# SET a value with TTL (expire after 300 seconds)
SET user:123:profile '{"name":"Alice","email":"[email protected]"}' EX 300

# GET a cached value
GET user:123:profile

# Check TTL remaining
TTL user:123:profile

# Delete a cached key
DEL user:123:profile

# Set only if key doesn't exist (cache miss pattern)
SET user:123:profile '{"name":"Alice"}' NX EX 300

Caching in Python with Redis

import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user_profile(user_id):
    """Cache-aside pattern: check cache first, fallback to DB."""
    cache_key = f"user:{user_id}:profile"

    # 1. Check cache
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)  # Cache hit!

    # 2. Cache miss: fetch from database
    profile = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Store in cache with TTL (5 minutes)
    r.setex(cache_key, 300, json.dumps(profile))

    return profile

def update_user_profile(user_id, data):
    """Update DB and invalidate cache."""
    db.execute(
        "UPDATE users SET name=%s, email=%s WHERE id=%s",
        data['name'], data['email'], user_id
    )
    # Invalidate the cached version
    r.delete(f"user:{user_id}:profile")

Caching in Node.js with Redis

const Redis = require('ioredis');
const redis = new Redis();

async function getUserProfile(userId) {
  const cacheKey = `user:${userId}:profile`;

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached); // Cache hit!
  }

  // 2. Cache miss: fetch from database
  const profile = await db.query(
    'SELECT * FROM users WHERE id = $1', [userId]
  );

  // 3. Store in cache with TTL (5 minutes)
  await redis.setex(cacheKey, 300, JSON.stringify(profile));

  return profile;
}

async function updateUserProfile(userId, data) {
  await db.query(
    'UPDATE users SET name=$1, email=$2 WHERE id=$3',
    [data.name, data.email, userId]
  );
  // Invalidate the cached version
  await redis.del(`user:${userId}:profile`);
}

Cache Invalidation

Cached data goes stale. The three main invalidation strategies are:

TTL-based (Time To Live) — Set an expiration time. After 5 minutes (or whatever you choose), the cache entry is automatically deleted. Simple and good enough for most cases.
Event-based — When data changes, actively delete or update the cache entry. More complex but ensures freshness. Works well with webhooks and message queues.
Manual purge — An admin or deployment script clears the cache. Used for deployments or emergency fixes.

Warning: "There are only two hard things in computer science: cache invalidation and naming things." — Phil Karlton. Stale cache is the source of countless production bugs. When in doubt, use short TTLs and invalidate on write. It's better to have a cache miss than to serve stale data for hours.

CDN Caching

A CDN caches your content on servers around the world (edge locations). Users in Tokyo get served from a Tokyo edge server instead of your origin in Virginia — cutting latency from 200ms to 20ms.

# Cache-Control headers for different content types

# Static assets (CSS, JS, images): cache for 1 year
Cache-Control: public, max-age=31536000, immutable

# HTML pages: cache briefly, revalidate
Cache-Control: public, max-age=3600, must-revalidate

# API responses: don't cache by default
Cache-Control: no-store

# Private data (user-specific): never cache on CDN
Cache-Control: private, no-cache

Static assets — Cache aggressively (1 year) with content hashing in filenames (style.abc123.css) so new deploys get new URLs.
HTML pages — Short TTL (1 hour) with must-revalidate so users get fresh content without waiting too long.
API responses — Generally no-store unless the data is public and rarely changes (e.g., product catalog).

CDN providers: Cloudflare (free tier is excellent), AWS CloudFront, Fastly, Akamai. Cloudflare Pages (what this site uses!) automatically handles CDN caching for static sites with zero configuration.

When to Use What

These patterns often work together, but here's a quick decision guide for when to reach for each one:

Scenario	Solution	Why
Need real-time notifications from an external service	Webhooks	Push-based: the service tells you when something happens instead of you polling
Need to decouple services or handle traffic spikes	Message Queue	Buffers work, retries failures, lets services scale independently
Need to speed up repeated reads	Caching	Store computed results in fast storage; avoid hitting the database on every request
Need to offload slow work from the request cycle	Background Jobs	Respond to the user immediately; process heavy work asynchronously
Need event streaming or log aggregation	Kafka	High-throughput, persistent, replayable event stream for real-time data pipelines

In practice, they combine: A Stripe webhook arrives → your server pushes it onto a RabbitMQ queue → a background worker processes the payment → the result is cached in Redis for the dashboard. All four patterns in one flow.

Test Yourself

Q1: What's the difference between a webhook and polling? When would you use each?

Polling means your server repeatedly asks an external service "anything new?" on a timer (e.g., every 30 seconds). It's simple but wastes resources when nothing has changed. Webhooks flip the model: the external service sends an HTTP POST to your server the moment something happens. Use webhooks when the external service supports them (Stripe, GitHub, Slack) for near-real-time updates with minimal resource usage. Use polling when the external service doesn't support webhooks, or when you need to pull data on your own schedule (e.g., scraping or batch imports).

Q2: When should you introduce a message queue into your architecture?

Introduce a message queue when you need: (1) async processing — slow tasks like email, PDF generation, or image processing that shouldn't block the API response, (2) service decoupling — services that shouldn't depend directly on each other, (3) spike absorption — a buffer to handle sudden traffic bursts without overloading downstream services, or (4) reliable delivery — guaranteed processing with retries and dead letter queues. Don't add a queue just because it's trendy — it adds operational complexity. For a simple app with low traffic, synchronous processing is fine.

Q3: Explain the cache-aside pattern step by step.

The cache-aside (or lazy loading) pattern works in three steps: (1) The application checks the cache (e.g., Redis) for the requested data. (2) On a cache hit, return the cached data immediately. On a cache miss, query the database for the data. (3) After fetching from the database, store the result in the cache with a TTL (e.g., 5 minutes) so the next request gets a cache hit. On writes, update the database first, then invalidate (delete) the cache entry so the next read fetches fresh data. This pattern is the most common because it only caches data that's actually requested, avoiding wasted memory on unused entries.

Q4: What is a dead letter queue and why is it important?

A dead letter queue (DLQ) is a separate queue where messages go after they've failed processing a maximum number of times. Without a DLQ, a "poison message" (one that always causes the consumer to crash) would be retried forever, blocking all other messages in the queue. With a DLQ, after 3–5 retry attempts, the message moves to the DLQ where it can be inspected, debugged, and reprocessed manually. DLQs are essential for production systems because they prevent one bad message from taking down your entire processing pipeline while preserving the failed message for later analysis.

Q5: How should you handle webhook failures and retries?

Three critical practices: (1) Return 200 immediately — acknowledge receipt before processing. If your processing takes too long, the webhook provider times out and retries, causing duplicate deliveries. (2) Process asynchronously — push the webhook payload onto a message queue and process it in a background worker. This decouples receipt from processing. (3) Implement idempotency — store processed event IDs in your database. Before processing a webhook, check if you've already handled that event ID. This prevents double-processing when retries happen. Also verify the webhook signature (HMAC-SHA256) to ensure it actually came from the claimed provider, not a malicious actor.

Interview Questions

Q1: Design a webhook system that can handle 10,000 events per second.

Architecture: Use a thin webhook receiver behind a load balancer. The receiver does only two things: (1) verify the signature and (2) push the raw payload onto a message queue (Kafka or SQS). Return 200 immediately. This keeps the receiver fast (~1ms per request).

Processing layer: A fleet of consumer workers pulls messages from the queue and processes them. Scale consumers horizontally based on queue depth. Use consumer groups (Kafka) or competing consumers (RabbitMQ) for parallel processing.

Reliability: Kafka persists messages to disk with configurable retention (e.g., 7 days). If consumers go down, messages wait in the queue. Dead letter queues catch poison messages. Every consumer is idempotent using event IDs stored in a database.

Scaling: The receiver layer scales horizontally behind the load balancer. Kafka partitions allow parallel consumption. Add more partitions and consumers as throughput grows. Redis can cache deduplication checks for recently processed event IDs.

Q2: How do you prevent processing the same webhook event twice?

Idempotency key: Every webhook event has a unique ID (e.g., Stripe's evt_1NqFbSD2eZvKYlo2C3LPaXnf). Store processed event IDs in a database table with a unique constraint.

Before processing: Attempt to INSERT the event ID. If it already exists (unique constraint violation), skip processing — you've already handled this event.

Atomic check-and-process: Use a database transaction: INSERT the event ID and process the event in the same transaction. If the insert fails (duplicate), the entire transaction rolls back and no work is done.

Fast dedup layer: For high-throughput systems, add a Redis SET check before the database. SADD processed_events evt_id returns 0 if already in the set. This avoids hitting the database for obvious duplicates. Set a TTL on the Redis set entries (e.g., 48 hours) matching the webhook provider's retry window.

Make operations idempotent by design: Use UPDATE ... SET status = 'paid' WHERE id = ? AND status = 'pending' instead of unconditional updates. The second execution of the same event has no effect because the WHERE clause no longer matches.

Q3: What Redis cache invalidation strategy would you use for a social media feed?

Short TTL + event-based invalidation hybrid. Cache each user's feed in Redis with a 2–5 minute TTL. This ensures worst-case staleness is bounded.

Write-through on new posts: When a user creates a post, push an invalidation event to a queue. Workers delete or update the cached feeds of that user's followers. For users with millions of followers (celebrities), use fanout-on-read instead — don't pre-compute their followers' feeds.

Cache structure: Use a Redis Sorted Set (ZADD feed:user:123 timestamp post_id) with post IDs scored by timestamp. Trim to the latest 100 posts with ZREMRANGEBYRANK. Fetch the feed with ZREVRANGE and batch-load post content from a separate cache or database.

Why not pure TTL: A 5-minute TTL means a user who just posted won't see their own post for up to 5 minutes. Event-based invalidation of the poster's own feed gives instant feedback while TTL handles everyone else efficiently.

Q4: When would you choose Kafka over RabbitMQ, and vice versa?

Choose Kafka when: You need (1) very high throughput (100K+ messages/sec), (2) event replay — consumers can re-read past messages by resetting their offset, (3) event sourcing or log aggregation where you need a persistent, ordered event log, (4) multiple consumers need to independently read the same stream of events (consumer groups), or (5) real-time data pipelines feeding analytics, ML, or data warehouses.

Choose RabbitMQ when: You need (1) traditional task queues where each message is processed by exactly one worker, (2) complex routing — RabbitMQ's exchange types (direct, topic, fanout, headers) offer flexible message routing, (3) lower operational complexity — RabbitMQ is simpler to set up and manage than a Kafka cluster, (4) request-reply (RPC) patterns, or (5) your throughput is moderate (under 50K msg/sec) and you don't need event replay.

In short: Kafka is an event streaming platform (append-only log). RabbitMQ is a message broker (queue with routing). Kafka for "what happened" (event log). RabbitMQ for "do this thing" (task queue).

Webhooks, Queues & Caching

The Big Picture

Webhooks

Push vs Poll

How Webhooks Work

Webhook Payload Example

Building a Webhook Receiver

Verifying Webhook Signatures

Idempotency: Handling Duplicate Deliveries

Retry Logic and Failure Handling

Common Webhook Providers

Message Queues

Why Use Queues?

Comparison: Message Queue Technologies

Key Concepts

Publishing and Consuming with Python

Common Use Cases

Background Jobs & Task Queues

Popular Task Queue Libraries

When to Move Work to the Background

Celery Task Example

Monitoring Background Jobs

Caching

Why Caching Matters: Latency by Storage Layer

Cache Layers

Caching Strategies

Redis as a Cache

Caching in Python with Redis

Caching in Node.js with Redis

Cache Invalidation

CDN Caching

When to Use What

Test Yourself

Interview Questions

Related Topics