This skill should be used when the user asks to "design multi-agent system", "implement supervisor pattern", "create swarm architecture", "coordinate multiple agents", or mentions multi-agent patterns, context isolation, agent handoffs, sub-agents, or parallel agent execution.
62
53%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/multi-agent-patterns/SKILL.mdMulti-agent architectures distribute work across multiple language model instances, each with its own context window. When designed well, this distribution enables capabilities beyond single-agent limits. When designed poorly, it introduces coordination overhead that negates benefits. The critical insight is that sub-agents exist primarily to isolate context, not to anthropomorphize role division.
Activate this skill when:
Use multi-agent patterns when a single agent's context window cannot hold all task-relevant information. Context isolation is the primary benefit — each agent operates in a clean context without accumulated noise from other subtasks, preventing the telephone game problem where information degrades through repeated summarization.
Choose among three dominant patterns based on coordination needs, not organizational metaphor:
Design every multi-agent system around explicit coordination protocols, consensus mechanisms that resist sycophancy, and failure handling that prevents error propagation cascades.
The Context Bottleneck Reach for multi-agent architectures when a single agent's context fills with accumulated history, retrieved documents, and tool outputs to the point where performance degrades. Recognize three degradation signals: the lost-in-middle effect (attention weakens for mid-context content), attention scarcity (too many competing items), and context poisoning (irrelevant content displaces useful content).
Partition work across multiple context windows so each agent operates in a clean context focused on its subtask. Aggregate results at a coordination layer without any single context bearing the full burden.
The Token Economics Reality Budget for substantially higher token costs. Production data shows multi-agent systems run at approximately 15x the token cost of a single-agent chat:
| Architecture | Token Multiplier | Use Case |
|---|---|---|
| Single agent chat | 1x baseline | Simple queries |
| Single agent with tools | ~4x baseline | Tool-using tasks |
| Multi-agent system | ~15x baseline | Complex research/coordination |
Research on the BrowseComp evaluation found that three factors explain 95% of performance variance: token usage (80% of variance), number of tool calls, and model choice. This validates distributing work across agents with separate context windows to add capacity for parallel reasoning.
Prioritize model selection alongside architecture design — upgrading to better models often provides larger performance gains than doubling token budgets. BrowseComp data shows that model quality improvements frequently outperform raw token increases. Treat model selection and multi-agent architecture as complementary strategies.
The Parallelization Argument Assign parallelizable subtasks to dedicated agents with fresh contexts rather than processing them sequentially in a single agent. A research task requiring searches across multiple independent sources, analysis of different documents, or comparison of competing approaches benefits from parallel execution. Total real-world time approaches the duration of the longest subtask rather than the sum of all subtasks.
The Specialization Argument Configure each agent with only the system prompt, tools, and context it needs for its specific subtask. A general-purpose agent must carry all possible configurations in context, diluting attention. Specialized agents carry only what they need, operating with lean context optimized for their domain. Route from a coordinator to specialized agents to achieve specialization without combinatorial explosion.
Pattern 1: Supervisor/Orchestrator Deploy a central agent that maintains global state and trajectory, decomposes user objectives into subtasks, and routes to appropriate workers.
User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final OutputChoose this pattern when: tasks have clear decomposition, coordination across domains is needed, or human oversight is important.
Expect these trade-offs: strict workflow control and easier human-in-the-loop interventions, but the supervisor context becomes a bottleneck, supervisor failures cascade to all workers, and the "telephone game" problem emerges where supervisors paraphrase sub-agent responses incorrectly.
The Telephone Game Problem and Solution Anticipate that supervisor architectures initially perform approximately 50% worse than optimized versions due to the telephone game problem (LangGraph benchmarks). Supervisors paraphrase sub-agent responses, losing fidelity with each pass.
Fix this by implementing a forward_message tool that allows sub-agents to pass responses directly to users:
def forward_message(message: str, to_user: bool = True):
"""
Forward sub-agent response directly to user without supervisor synthesis.
Use when:
- Sub-agent response is final and complete
- Supervisor synthesis would lose important details
- Response format must be preserved exactly
"""
if to_user:
return {"type": "direct_response", "content": message}
return {"type": "supervisor_input", "content": message}Prefer swarm architectures over supervisors when sub-agents can respond directly to users, as this eliminates translation errors entirely.
Pattern 2: Peer-to-Peer/Swarm Remove central control and allow agents to communicate directly based on predefined protocols. Any agent transfers control to any other through explicit handoff mechanisms.
def transfer_to_agent_b():
return agent_b # Handoff via function return
agent_a = Agent(
name="Agent A",
functions=[transfer_to_agent_b]
)Choose this pattern when: tasks require flexible exploration, rigid planning is counterproductive, or requirements emerge dynamically and defy upfront decomposition.
Expect these trade-offs: no single point of failure and effective breadth-first scaling, but coordination complexity increases with agent count, divergence risk rises without a central state keeper, and robust convergence constraints become essential.
Define explicit handoff protocols with state passing. Ensure agents communicate their context needs to receiving agents.
Pattern 3: Hierarchical Organize agents into layers of abstraction: strategy (goal definition), planning (task decomposition), and execution (atomic tasks).
Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)Choose this pattern when: projects have clear hierarchical structure, workflows involve management layers, or tasks require both high-level planning and detailed execution.
Expect these trade-offs: clear separation of concerns and support for different context structures at different levels, but coordination overhead between layers, potential strategy-execution misalignment, and complex error propagation paths.
Treat context isolation as the primary purpose of multi-agent architectures. Each sub-agent should operate in a clean context window focused on its subtask without carrying accumulated context from other subtasks.
Isolation Mechanisms Select the right isolation mechanism for each subtask:
Choose based on task complexity, coordination needs, and acceptable latency. Default to instruction passing and escalate to file system memory when shared state is needed. Avoid full context delegation unless the subtask genuinely requires it.
The Voting Problem Avoid simple majority voting — it treats hallucinations from weak models as equal to reasoning from strong models. Without intervention, multi-agent discussions devolve into consensus on false premises due to inherent bias toward agreement.
Weighted Voting Weight agent votes by confidence or expertise. Agents with higher confidence or domain expertise should carry more weight in final decisions.
Debate Protocols Structure agents to critique each other's outputs over multiple rounds. Adversarial critique often yields higher accuracy on complex reasoning than collaborative consensus. Guard against sycophantic convergence where agents agree to be agreeable rather than correct.
Trigger-Based Intervention Monitor multi-agent interactions for behavioral markers. Activate stall triggers when discussions make no progress. Detect sycophancy triggers when agents mimic each other's answers without unique reasoning.
Different frameworks implement these patterns with different philosophies. LangGraph uses graph-based state machines with explicit nodes and edges. AutoGen uses conversational/event-driven patterns with GroupChat. CrewAI uses role-based process flows with hierarchical crew structures.
Failure: Supervisor Bottleneck The supervisor accumulates context from all workers, becoming susceptible to saturation and degradation.
Mitigate by constraining worker output schemas so workers return only distilled summaries. Use checkpointing to persist supervisor state without carrying full history in context.
Failure: Coordination Overhead Agent communication consumes tokens and introduces latency. Complex coordination can negate parallelization benefits.
Mitigate by minimizing communication through clear handoff protocols. Batch results where possible. Use asynchronous communication patterns. Measure whether multi-agent coordination actually saves time versus a single agent with a longer context.
Failure: Divergence Agents pursuing different goals without central coordination drift from intended objectives.
Mitigate by defining clear objective boundaries for each agent. Implement convergence checks that verify progress toward shared goals. Set time-to-live limits on agent execution to prevent unbounded exploration.
Failure: Error Propagation Errors in one agent's output propagate to downstream agents that consume that output, compounding into increasingly wrong results.
Mitigate by validating agent outputs before passing to consumers. Implement retry logic with circuit breakers. Use idempotent operations where possible. Consider adding a verification agent that cross-checks critical outputs before they enter the pipeline.
Example 1: Research Team Architecture
Supervisor
├── Researcher (web search, document retrieval)
├── Analyzer (data analysis, statistics)
├── Fact-checker (verification, validation)
└── Writer (report generation, formatting)Example 2: Handoff Protocol
def handle_customer_request(request):
if request.type == "billing":
return transfer_to(billing_agent)
elif request.type == "technical":
return transfer_to(technical_agent)
elif request.type == "sales":
return transfer_to(sales_agent)
else:
return handle_general(request)This skill builds on context-fundamentals and context-degradation. It connects to:
Internal reference:
Related skills in this collection:
External resources:
Created: 2025-12-20 Last Updated: 2026-03-17 Author: Agent Skills for Context Engineering Contributors Version: 2.0.0
3ab8c94
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.