Multi-Agent Systems

TL;DR

Multi-agent systems use multiple specialized agents working together instead of one monolithic agent. Key patterns: Supervisor (one agent delegates to workers), Debate (agents argue to find better answers), Pipeline (agents process in sequence), and Swarm (dynamic hand-offs between peers). Use multi-agent when tasks need different expertise, parallelism, or quality checks — but start with a single agent first.

Explain Like I'm 12

Imagine you're running a group project at school. You could try to do everything yourself — research, writing, design, and presenting. But it's way better to split the work. One friend does research, another writes the report, another makes the slides. That's a multi-agent system! Each "agent" is a specialist. There's usually a team leader (the supervisor agent) who decides who does what, checks the work, and puts it all together. Sometimes two agents debate a question to make sure the answer is really good — like having two friends argue about whether a fact is right before putting it in the report.

Multi-Agent Patterns

Diagram showing 4 multi-agent patterns: Supervisor delegates to workers, Pipeline passes work sequentially, Debate has agents argue, Swarm dynamically hands off between peers

When to Use Multi-Agent

Multi-agent systems add complexity. Only use them when a single agent genuinely isn't enough:

Use Multi-Agent When...Stick With Single Agent When...
Task requires different expertise (code + design + testing)One skillset covers the entire task
Subtasks can run in parallel for speedSteps are strictly sequential
You need quality checks (reviewer agent)Output quality is acceptable without review
Context window is too small for everythingAll context fits in one conversation
You want separation of concerns (security)All operations have the same trust level
Warning: The #1 mistake in agent development is jumping to multi-agent when a well-designed single agent would work. Every additional agent adds latency, cost, and failure modes. Start simple.

Supervisor Pattern

One supervisor agent receives the task, breaks it into subtasks, delegates to specialized worker agents, collects results, and assembles the final output. This is the most common multi-agent pattern.

# Supervisor pattern: One boss, multiple workers
class SupervisorAgent:
    def __init__(self):
        self.workers = {
            "researcher": ResearchAgent(),
            "coder": CodingAgent(),
            "reviewer": ReviewAgent(),
        }

    def run(self, task):
        # Plan: Decide which workers to use
        plan = self.llm.generate(
            f"Break this task into subtasks and assign each to a worker.\n"
            f"Available workers: {list(self.workers.keys())}\n"
            f"Task: {task}"
        )
        # plan: [("researcher", "find best practices"), ("coder", "implement"), ("reviewer", "review")]

        results = {}
        for worker_name, subtask in plan:
            worker = self.workers[worker_name]
            # Pass relevant context from previous results
            result = worker.run(subtask, context=results)
            results[worker_name] = result

        # Synthesize final output
        return self.llm.generate(
            f"Combine these results into a final response:\n{results}"
        )
Tip: Give each worker agent a focused system prompt and a limited set of tools. A coding agent doesn't need web search; a research agent doesn't need file write. Specialization improves reliability.

Debate Pattern

Two or more agents independently answer the same question, then critique each other's answers in rounds. A judge agent (or the same agents) converges on the best answer. This improves accuracy on complex reasoning tasks.

# Debate pattern: Multiple agents argue to find truth
def debate(question, num_rounds=3):
    # Generate initial positions
    agent_a_answer = agent_a.generate(question)
    agent_b_answer = agent_b.generate(question)

    for round in range(num_rounds):
        # Agent A critiques Agent B
        a_critique = agent_a.generate(
            f"Question: {question}\n"
            f"Your answer: {agent_a_answer}\n"
            f"Opponent's answer: {agent_b_answer}\n"
            f"Critique their answer and defend or update yours."
        )

        # Agent B critiques Agent A
        b_critique = agent_b.generate(
            f"Question: {question}\n"
            f"Your answer: {agent_b_answer}\n"
            f"Opponent's answer: {agent_a_answer}\n"
            f"Critique their answer and defend or update yours."
        )

        agent_a_answer = a_critique
        agent_b_answer = b_critique

    # Judge picks the best answer
    return judge.generate(
        f"Pick the better answer:\nA: {agent_a_answer}\nB: {agent_b_answer}"
    )
Info: Research shows debate improves factual accuracy by 10-20% on complex questions. It works because agents catch each other's mistakes and hallucinations. The adversarial pressure forces more careful reasoning.

Pipeline Pattern

Agents process work sequentially, like an assembly line. Each agent transforms the output and passes it to the next. Great for workflows with clear stages.

# Pipeline pattern: Sequential processing
pipeline = [
    ("planner", "Break this feature request into technical tasks"),
    ("coder", "Implement each task with clean, tested code"),
    ("reviewer", "Review the code for bugs, security issues, and style"),
    ("documenter", "Write documentation for the new feature"),
]

def run_pipeline(initial_input):
    current_output = initial_input
    for agent_name, instruction in pipeline:
        agent = get_agent(agent_name)
        current_output = agent.run(
            f"{instruction}\n\nInput from previous stage:\n{current_output}"
        )
    return current_output

Swarm Pattern

In a swarm, agents dynamically hand off conversations to each other based on the current need. There's no fixed hierarchy — any agent can transfer control to another. This is ideal for customer-facing workflows where the topic shifts.

# Swarm pattern: Dynamic hand-offs between peers
agents = {
    "triage": TriageAgent(),       # Classifies the request
    "billing": BillingAgent(),     # Handles billing questions
    "technical": TechnicalAgent(), # Handles tech support
    "escalation": HumanAgent(),    # Escalates to human
}

def swarm_loop(user_message):
    current_agent = agents["triage"]
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = current_agent.run(messages)

        if response.handoff_to:
            # Agent decided to hand off to a specialist
            current_agent = agents[response.handoff_to]
            messages.append({"role": "system",
                "content": f"Transferred from {response.agent_name}. Context: {response.summary}"})
        elif response.done:
            return response.final_answer
        else:
            messages.append(response.message)
Tip: Swarm works best when each agent has a clear "I can handle this" / "this isn't my area" boundary. Make hand-off criteria explicit in the system prompt.

Agent Communication

How agents share information is as important as how they reason. Key approaches:

MethodHow It WorksBest For
Direct messagingOne agent passes output directly to anotherPipelines, supervisor → worker
Shared memoryAll agents read/write to a shared state storeCollaboration on the same artifact
BlackboardCentral "board" where agents post findingsResearch, multi-perspective analysis
Event-drivenAgents subscribe to events and reactAsync workflows, microservice-style

Challenges & Pitfalls

  • Cost explosion — N agents × M steps × token cost adds up fast. Budget carefully.
  • Error propagation — one agent's mistake cascades to all downstream agents. Build validation between stages.
  • Debugging difficulty — tracing a bug across 5 agents with separate conversations is painful. Invest in logging.
  • Coordination overhead — agents may duplicate work, contradict each other, or deadlock. Clear protocols prevent this.
  • Latency — sequential multi-agent pipelines multiply response time. Parallelize where possible.
Warning: Multi-agent architectures are seductive but often unnecessary. Before adding agents, ask: "Could I solve this with a better prompt and more tools on one agent?" The answer is usually yes.

Test Yourself

Name and describe 4 multi-agent patterns.

(1) Supervisor — one agent delegates subtasks to specialized workers. (2) Debate — agents independently answer a question, then critique each other in rounds to converge on the best answer. (3) Pipeline — agents process work sequentially, like an assembly line. (4) Swarm — agents dynamically hand off conversations to each other based on the current topic.

When should you use multi-agent vs. single agent?

Use multi-agent when: different expertise is needed, subtasks can run in parallel, you need quality checks, context window is too small, or you want security separation. Stick with single agent when: one skillset suffices, steps are sequential, output quality is acceptable, all context fits in one conversation.

What's the biggest risk of multi-agent systems?

Error propagation and cost explosion. One agent's mistake cascades to all downstream agents. And N agents × M steps × token cost adds up fast. Additionally, debugging across multiple agent conversations is significantly harder than debugging a single agent. Always start with a single agent and add more only when genuinely needed.

How does the debate pattern improve accuracy?

Debate forces agents to adversarially critique each other's answers. Each agent must defend its reasoning and find flaws in the opponent's. This catches hallucinations, logical errors, and weak reasoning that a single agent might miss. Research shows 10-20% accuracy improvement on complex questions.

Compare shared memory vs. direct messaging for agent communication.

Direct messaging: one agent passes its output to the next. Simple, predictable, good for pipelines and supervisor/worker patterns. Shared memory: all agents read/write to a common store. More flexible — agents can access any information at any time — but requires concurrency management and can lead to conflicts. Use direct messaging by default; shared memory when agents need to collaborate on the same artifact.

Interview Questions

Design a multi-agent system for automated code review. What agents would you include?

Use the supervisor + pipeline pattern: (1) Diff Analyzer — reads the PR diff, identifies changed files and their purpose. (2) Security Scanner — checks for vulnerabilities (injection, auth issues, secrets). (3) Style Reviewer — checks coding standards, naming, documentation. (4) Logic Reviewer — analyzes correctness, edge cases, error handling. (5) Supervisor — collects all reviews, resolves conflicts, generates a unified review with severity-ranked findings. Each specialist has focused tools and prompts for their domain.

How would you handle a scenario where two agents in a multi-agent system disagree?

Options by escalation level: (1) Voting — if multiple agents weigh in, majority wins. (2) Debate — let the disagreeing agents argue with evidence for 2-3 rounds. (3) Arbiter agent — a higher-authority agent reviews both positions and decides. (4) Human escalation — present both viewpoints to a human decision-maker. The right choice depends on stakes: voting for low-stakes, human escalation for high-stakes (deployments, financial decisions).

What are the cost implications of a 5-agent pipeline vs. a single agent, and how would you optimize?

A 5-agent pipeline means ~5x the LLM calls and token usage. Optimizations: (1) Use smaller models for simple stages (Haiku for formatting, Opus for reasoning). (2) Parallelize independent stages — don't run sequentially if stages are independent. (3) Cache common results — if multiple tasks hit the same research, cache it. (4) Short-circuit — skip stages when not needed (skip security review for docs-only changes). (5) Batch processing — group similar tasks to amortize overhead.