home / skills / hexbee / hello-skills / multi-agent-systems

multi-agent-systems skill

safe

This skill helps you design and orchestrate multi-agent LLM architectures to improve task decomposition, parallelization, and verification.

npx playbooks add skill hexbee/hello-skills --skill multi-agent-systems

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

8.3 KB

---
name: multi-agent-systems
description: "Design and implement multi-agent LLM architectures using the orchestrator-subagent pattern. Use when: (1) Deciding whether to use multi-agent vs single-agent systems, (2) Implementing context isolation for high-volume operations, (3) Parallelizing independent research tasks, (4) Creating specialized agents with focused tool sets, (5) Building verification subagents for quality assurance, or (6) Analyzing context-centric decomposition boundaries."
---

# Multi-Agent Systems

## When to Use Multi-Agent Architectures

Multi-agent systems introduce overhead. Every additional agent represents another potential point of failure, another set of prompts to maintain, and another source of unexpected behavior.

**Multi-agent systems use 3-10x more tokens than single-agent approaches** due to:
- Duplicating context across agents
- Coordination messages between agents
- Summarizing results for handoffs

### Start with a Single Agent

A well-designed single agent with appropriate tools can accomplish far more than expected. Use single agent when:
- Tasks are sequential and context-dependent
- Tool count is under 15-20
- No clear benefit from parallelization

### Three Cases Where Multi-Agent Excels

1. **Context pollution** - Subtasks generate >1000 tokens but most info is irrelevant to main task
2. **Parallelization** - Tasks can run independently and explore larger search space
3. **Specialization** - Different tasks need different tools, prompts, or domain expertise

## Decision Framework

### Context Protection Pattern

Use when subtasks generate large context but only summary is needed for main task.

**Example: Customer Support**
```python
class OrderLookupAgent:
    def lookup_order(self, order_id: str) -> dict:
        messages = [{"role": "user", "content": f"Get essential details for order {order_id}"}]
        response = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=1024,
            messages=messages, tools=[get_order_details_tool]
        )
        return extract_summary(response)  # Returns 50-100 tokens, not 2000+

class SupportAgent:
    def handle_issue(self, user_message: str):
        if needs_order_info(user_message):
            order_id = extract_order_id(user_message)
            order_summary = OrderLookupAgent().lookup_order(order_id)
            context = f"Order {order_id}: {order_summary['status']}, purchased {order_summary['date']}"
        # Main agent gets clean context
```

**Best when:**
- Subtask generates >1000 tokens, most irrelevant
- Subtast is well-defined with clear extraction criteria
- Lookup/retrieval operations need filtering before use

### Parallelization Pattern

Use when exploring larger search space or independent research facets.

```python
import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()

async def research_topic(query: str) -> dict:
    facets = await lead_agent.decompose_query(query)
    tasks = [research_subagent(facet) for facet in facets]
    results = await asyncio.gather(*tasks)
    return await lead_agent.synthesize(results)

async def research_subagent(facet: str) -> dict:
    messages = [{"role": "user", "content": f"Research: {facet}"}]
    response = await client.messages.create(
        model="claude-sonnet-4-5", max_tokens=4096,
        messages=messages, tools=[web_search, read_document]
    )
    return extract_findings(response)
```

**Benefit:** Thoroughness, not speed. Covers more ground at higher token cost.

### Specialization Patterns

#### Tool Set Specialization

Split by domain when agent has 20+ tools, shows domain confusion, or degraded performance.

**Signs you need specialization:**
1. Quantity: 20+ tools
2. Domain confusion: Tools span unrelated domains
3. Degraded performance: New tools hurt existing tasks

#### System Prompt Specialization

Different tasks require conflicting behavioral modes:
- Customer support: empathetic, patient
- Code review: precise, critical
- Compliance: rigid rule-following
- Brainstorming: creative flexibility

#### Domain Expertise Specialization

Deep domain context that would overwhelm a generalist:
- Legal analysis: case law, regulatory frameworks
- Medical research: clinical trial methodology

### Multi-Platform Integration Example

```python
class CRMAgent:
    system_prompt = """You are a CRM specialist. You manage contacts,
    opportunities, and account records. Always verify record ownership
    before updates and maintain data integrity across related records."""
    tools = [crm_get_contacts, crm_create_opportunity]  # 8-10 CRM tools

class MarketingAgent:
    system_prompt = """You are a marketing automation specialist. You
    manage campaigns, lead scoring, and email sequences."""
    tools = [marketing_get_campaigns, marketing_create_lead]  # 8-10 tools

class OrchestratorAgent:
    def execute(self, user_request: str):
        response = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=1024,
            system="""Route to appropriate specialist:
    - CRM: Contacts, opportunities, accounts, sales pipeline
    - Marketing: Campaigns, lead nurturing, email sequences""",
            messages=[{"role": "user", "content": user_request}],
            tools=[delegate_to_crm, delegate_to_marketing]
        )
        return response
```

## Context-Centric Decomposition

**Problem-centric (counterproductive):** Split by work type (writer, tester, reviewer) - creates coordination overhead, context loss at handoffs.

**Context-centric (effective):** Agent handling a feature also handles its tests - already has necessary context.

### Effective Boundaries
- Independent research paths
- Separate components with clean API contracts
- Blackbox verification

### Problematic Boundaries
- Sequential phases of same work
- Tightly coupled components
- Work requiring shared state

## Verification Subagent Pattern

Dedicated agent for testing/validating main agent's work. Succeeds because verification requires minimal context transfer.

```python
class CodingAgent:
    def implement_feature(self, requirements: str) -> dict:
        response = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=4096,
            messages=[{"role": "user", "content": f"Implement: {requirements}"}],
            tools=[read_file, write_file, list_directory]
        )
        return {"code": response.content, "files_changed": extract_files(response)}

class VerificationAgent:
    def verify_implementation(self, requirements: str, files_changed: list) -> dict:
        messages = [{"role": "user", "content": f"""
Requirements: {requirements}
Files changed: {files_changed}

Run the complete test suite and verify:
1. All existing tests pass
2. New functionality works as specified
3. No obvious errors or security issues

You MUST run: pytest --verbose
Only mark as PASSED if ALL tests pass with no failures.
"""}]
        response = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=4096,
            messages=messages, tools=[run_tests, execute_code, read_file]
        )
        return {"passed": extract_pass_fail(response), "issues": extract_issues(response)}
```

### Mitigate "Early Victory Problem"

Verifier marks passing without thorough testing. Prevention:

- **Concrete criteria:** "Run full test suite" not "make sure it works"
- **Comprehensive checks:** Test multiple scenarios and edge cases
- **Negative tests:** Confirm inputs that should fail do fail
- **Explicit instructions:** "You MUST run the complete test suite"

## Moving Forward Checklist

Before adding multi-agent complexity:

1. [ ] Genuine constraints exist (context limits, parallelization, specialization need)
2. [ ] Decomposition follows context, not problem type
3. [ ] Clear verification points where subagents can validate

**Start with simplest approach that works. Add complexity only when evidence supports it.**

## References

- [Building multi-agent systems: when and how to use them](https://claude.com/blog/building-multi-agent-systems-when-and-how-to-use-them)
- [Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
- [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- [Writing effective tools for agents](https://www.anthropic.com/engineering/writing-tools-for-agents)

Overview

This skill helps design and implement multi-agent LLM architectures using an orchestrator-subagent pattern. It guides when to choose multi-agent vs single-agent approaches, how to isolate context, parallelize research, specialize tool sets, and build verification subagents. The focus is practical patterns, trade-offs, and concrete boundaries for effective decomposition.

How this skill works

The skill provides patterns and code sketches for orchestrator-led workflows where a lead agent decomposes tasks and delegates to specialized subagents. Subagents run isolated contexts, return concise summaries, and the orchestrator synthesizes results. It covers context-protection, parallel research, tool specialization, and verification subagents with explicit instruction templates to reduce context pollution and coordination overhead.

When to use it

When subtasks generate large irrelevant context and only summaries are needed (context protection).
When independent facets can be researched in parallel to explore a larger search space.
When an agent has many tools or conflicting behavioral modes and needs specialization.
When building a verification subagent to run tests or validate outputs.
When clear, context-centric decomposition boundaries exist (clean API contracts, independent components).

Best practices

Start with a single agent; add agents only when evidence shows benefits outweigh token/coordination costs.
Decompose by context, not by generic roles or sequential phases to avoid handoff friction.
Keep subagent outputs concise (50–200 tokens) and return structured summaries for handoffs.
Define concrete verification criteria and require full test executions for validation agents.
Limit tool overlap between specialists; give each agent a focused toolset and system prompt.

Example use cases

Customer support: order lookup subagent extracts a short order summary to avoid polluting main context.
Research project: orchestrator decomposes a topic and runs multiple research subagents concurrently, then synthesizes findings.
Large toolset split: separate CRM and Marketing specialists each with 8–10 domain tools, routed by an orchestrator.
Code delivery: coding agent implements feature, verification subagent runs full test suite and reports pass/fail with issues.
Compliance workflows: specialized compliance agent enforces rigid rules while others remain flexible for brainstorming.

FAQ

How many agents should I use?

Use as few as possible. Typical deployments use 3–5 agents: an orchestrator, 1–3 specialists, and a verifier. Add more only with clear benefits.

Do multi-agent systems cost more?

Yes. Expect 3–10x token usage due to duplicated context and coordination messages; weigh gains in modularity, parallelism, or safety.