home / skills / yonatangross / orchestkit / mcp-advanced-patterns

mcp-advanced-patterns skill

/plugins/ork/skills/mcp-advanced-patterns

This skill helps you design and deploy production-grade MCP servers by composing tools, managing resources, and scaling with auto-enable thresholds.

npx playbooks add skill yonatangross/orchestkit --skill mcp-advanced-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
7.5 KB
---
name: mcp-advanced-patterns
description: Advanced MCP patterns for tool composition, resource management, and scaling. Build custom MCP servers, compose tools, manage resources efficiently. Use when composing MCP tools or scaling MCP servers.
version: 1.0.0
author: OrchestKit
context: fork
agent: llm-integrator
tags: [mcp, tools, resources, scaling, servers, composition, 2026]
user-invocable: false
---

# MCP Advanced Patterns

Advanced Model Context Protocol patterns for production-grade MCP implementations.

> **FastMCP 2.14.x** (Jan 2026): Enterprise auth, OpenAPI/FastAPI generation, server composition, proxying. Python 3.10-3.13.

## Overview

- Composing multiple tools into orchestrated workflows
- Managing resource lifecycle and caching efficiently
- Scaling MCP servers horizontally with load balancing
- Building custom MCP servers with middleware and transports
- Implementing auto-enable thresholds for context management

## Tool Composition Pattern

```python
from dataclasses import dataclass
from typing import Any, Callable, Awaitable

@dataclass
class ComposedTool:
    """Combine multiple tools into a single pipeline operation."""
    name: str
    tools: dict[str, Callable[..., Awaitable[Any]]]
    pipeline: list[str]

    async def execute(self, input_data: dict[str, Any]) -> dict[str, Any]:
        """Execute tool pipeline sequentially."""
        result = input_data
        for tool_name in self.pipeline:
            tool = self.tools[tool_name]
            result = await tool(result)
        return result

# Example: Search + Summarize composition
search_summarize = ComposedTool(
    name="search_and_summarize",
    tools={
        "search": search_documents,
        "summarize": summarize_content,
    },
    pipeline=["search", "summarize"]
)
```

## FastMCP Server with Lifecycle

```python
from contextlib import asynccontextmanager
from collections.abc import AsyncIterator
from dataclasses import dataclass
from mcp.server.fastmcp import Context, FastMCP

@dataclass
class AppContext:
    """Typed application context with shared resources."""
    db: Database
    cache: CacheService
    config: dict

@asynccontextmanager
async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
    """Manage server startup and shutdown lifecycle."""
    # Initialize on startup
    db = await Database.connect()
    cache = await CacheService.connect()

    try:
        yield AppContext(db=db, cache=cache, config={"timeout": 30})
    finally:
        # Cleanup on shutdown
        await cache.disconnect()
        await db.disconnect()

mcp = FastMCP("Production Server", lifespan=app_lifespan)

@mcp.tool()
def query_data(sql: str, ctx: Context) -> str:
    """Execute query using shared connection."""
    app_ctx = ctx.request_context.lifespan_context
    return app_ctx.db.query(sql)
```

## Auto-Enable Thresholds (CC 2.1.9)

Configure MCP servers to auto-enable/disable based on context window usage:

```yaml
# .claude/settings.json
mcp:
  context7:
    enabled: auto:75    # High-value docs, keep available longer
  sequential-thinking:
    enabled: auto:60    # Complex reasoning needs room
  memory:
    enabled: auto:90    # Knowledge graph - preserve until compaction
  playwright:
    enabled: auto:50    # Browser-heavy, disable early
```

**Threshold Guidelines:**
| Threshold | Use Case | Rationale |
|-----------|----------|-----------|
| auto:90 | Critical persistence | Keep until context nearly full |
| auto:75 | High-value reference | Preserve for complex tasks |
| auto:60 | Reasoning tools | Need headroom for output |
| auto:50 | Resource-intensive | Disable early to free context |

## Resource Management

```python
from functools import lru_cache
from datetime import datetime, timedelta
from typing import Any

class MCPResourceManager:
    """Manage MCP resources with caching and lifecycle."""

    def __init__(self, cache_ttl: timedelta = timedelta(minutes=15)):
        self.resources: dict[str, Any] = {}
        self.cache_ttl = cache_ttl
        self.last_access: dict[str, datetime] = {}

    def get_resource(self, uri: str) -> Any:
        """Get resource with access time tracking."""
        if uri in self.resources:
            self.last_access[uri] = datetime.now()
            return self.resources[uri]

        resource = self._load_resource(uri)
        self.resources[uri] = resource
        self.last_access[uri] = datetime.now()
        return resource

    def cleanup_stale(self) -> int:
        """Remove stale resources. Returns count of removed."""
        now = datetime.now()
        stale = [
            uri for uri, last in self.last_access.items()
            if now - last > self.cache_ttl
        ]
        for uri in stale:
            del self.resources[uri]
            del self.last_access[uri]
        return len(stale)
```

## Horizontal Scaling

```python
import asyncio
from typing import List

class MCPLoadBalancer:
    """Load balance across multiple MCP server instances."""

    def __init__(self, servers: List[str]):
        self.servers = servers
        self.current = 0
        self.health: dict[str, bool] = {s: True for s in servers}

    async def get_healthy_server(self) -> str:
        """Round-robin with health check."""
        for _ in range(len(self.servers)):
            server = self.servers[self.current]
            self.current = (self.current + 1) % len(self.servers)
            if self.health[server]:
                return server
        raise RuntimeError("No healthy servers available")

    async def health_check_loop(self):
        """Periodic health check for all servers."""
        while True:
            for server in self.servers:
                try:
                    self.health[server] = await self._ping(server)
                except Exception:
                    self.health[server] = False
            await asyncio.sleep(30)
```

## Key Decisions

| Decision | Recommendation |
|----------|----------------|
| Transport | Streamable HTTP for web, stdio for CLI |
| Lifecycle | Always use lifespan for resource management |
| Composition | Chain tools via pipeline pattern |
| Scaling | Health-checked round-robin for redundancy |
| Auto-enable | Use auto:N thresholds per server criticality |

## Common Mistakes

- No lifecycle management (resource leaks)
- Missing health checks in load balancing
- Hardcoded server endpoints
- No graceful degradation on server failure
- Ignoring context window thresholds

## Related Skills

- `function-calling` - LLM tool integration patterns
- `resilience-patterns` - Circuit breakers and retries
- `connection-pooling` - Database connection management
- `streaming-api-patterns` - Real-time streaming

## Capability Details

### tool-composition
**Keywords:** tool composition, pipeline, orchestration, chain tools
**Solves:**
- Combine multiple tools into workflows
- Sequential tool execution
- Tool result passing

### resource-management
**Keywords:** resource, cache, lifecycle, cleanup, ttl
**Solves:**
- Manage resource lifecycle
- Implement resource caching
- Clean up stale resources

### scaling-strategies
**Keywords:** scale, load balance, horizontal, health check, redundancy
**Solves:**
- Scale MCP servers horizontally
- Implement health-checked load balancing
- Handle server failures gracefully

### server-building
**Keywords:** server, fastmcp, lifespan, middleware, transport
**Solves:**
- Build production MCP servers
- Manage server lifecycle
- Configure transports and middleware

### auto-enable-thresholds
**Keywords:** auto-enable, context window, threshold, auto:N
**Solves:**
- Configure MCP auto-enable/disable
- Manage context window usage
- Optimize MCP server availability

Overview

This skill provides advanced Model Context Protocol (MCP) patterns for composing tools, managing resources, and scaling production MCP servers. It bundles production-ready patterns for building custom MCP servers, orchestrating tool pipelines, lifecycle-aware resource management, and horizontal scaling with health-checked load balancing. The patterns target TypeScript/Claude Code ecosystems and integrate with common web stacks and transports.

How this skill works

The skill describes modular patterns: a pipeline composition pattern to chain asynchronous tools, a lifespan-based server context to initialize and clean up shared resources, a resource manager that caches with TTL and removes stale entries, and a load balancer that round-robins healthy MCP instances with periodic health checks. It also defines auto-enable threshold guidelines to automatically enable or disable context categories as the context window fills. Example code sketches demonstrate the runtime shapes and lifecycle hooks to adopt.

When to use it

  • Composing multiple tools into an orchestrated workflow where outputs feed subsequent steps.
  • Building a production MCP server that needs deterministic startup/shutdown resource management.
  • Managing cached resources and automatic cleanup for long-lived MCP processes.
  • Scaling MCP capacity horizontally with health-aware routing and redundancy.
  • Tuning context availability via auto-enable thresholds to protect critical state.

Best practices

  • Always use a lifespan or equivalent lifecycle hook to initialize and tear down shared resources to avoid leaks.
  • Compose tools as explicit pipelines so inputs/outputs are well-typed and testable.
  • Track last access times and enforce TTLs for cached resources; run periodic cleanup tasks.
  • Employ health checks and graceful degradation in the load balancer instead of hard-failing on unreachable instances.
  • Avoid hardcoded endpoints; discover or configure instances and support reconcilers for changing topology.

Example use cases

  • Create a search-then-summarize composed tool that runs a document search and then summarizes results in one pipeline.
  • Launch a FastMCP-style server that connects DB and cache on startup and reliably disconnects on shutdown.
  • Implement an MCPResourceManager to cache expensive external assets and evict stale entries after inactivity.
  • Deploy multiple MCP instances behind a MCPLoadBalancer that performs periodic health probes and routes traffic round-robin.
  • Configure auto-enable thresholds to keep high-value knowledge available while freeing heavy browser or transient contexts early.

FAQ

How do auto-enable thresholds affect context preservation?

Auto-enable thresholds mark context categories to remain enabled until the specified percent of the window is used; higher values keep content longer, lower values free space earlier.

What health check frequency is recommended for load balancers?

A short loop (for example, 15–60 seconds) balances responsiveness and cost; choose based on failure impact and probe overhead.