home / skills / yonatangross / orchestkit / mcp-advanced-patterns
/plugins/ork/skills/mcp-advanced-patterns
This skill helps you design and deploy production-grade MCP servers by composing tools, managing resources, and scaling with auto-enable thresholds.
npx playbooks add skill yonatangross/orchestkit --skill mcp-advanced-patternsReview the files below or copy the command above to add this skill to your agents.
---
name: mcp-advanced-patterns
description: Advanced MCP patterns for tool composition, resource management, and scaling. Build custom MCP servers, compose tools, manage resources efficiently. Use when composing MCP tools or scaling MCP servers.
version: 1.0.0
author: OrchestKit
context: fork
agent: llm-integrator
tags: [mcp, tools, resources, scaling, servers, composition, 2026]
user-invocable: false
---
# MCP Advanced Patterns
Advanced Model Context Protocol patterns for production-grade MCP implementations.
> **FastMCP 2.14.x** (Jan 2026): Enterprise auth, OpenAPI/FastAPI generation, server composition, proxying. Python 3.10-3.13.
## Overview
- Composing multiple tools into orchestrated workflows
- Managing resource lifecycle and caching efficiently
- Scaling MCP servers horizontally with load balancing
- Building custom MCP servers with middleware and transports
- Implementing auto-enable thresholds for context management
## Tool Composition Pattern
```python
from dataclasses import dataclass
from typing import Any, Callable, Awaitable
@dataclass
class ComposedTool:
"""Combine multiple tools into a single pipeline operation."""
name: str
tools: dict[str, Callable[..., Awaitable[Any]]]
pipeline: list[str]
async def execute(self, input_data: dict[str, Any]) -> dict[str, Any]:
"""Execute tool pipeline sequentially."""
result = input_data
for tool_name in self.pipeline:
tool = self.tools[tool_name]
result = await tool(result)
return result
# Example: Search + Summarize composition
search_summarize = ComposedTool(
name="search_and_summarize",
tools={
"search": search_documents,
"summarize": summarize_content,
},
pipeline=["search", "summarize"]
)
```
## FastMCP Server with Lifecycle
```python
from contextlib import asynccontextmanager
from collections.abc import AsyncIterator
from dataclasses import dataclass
from mcp.server.fastmcp import Context, FastMCP
@dataclass
class AppContext:
"""Typed application context with shared resources."""
db: Database
cache: CacheService
config: dict
@asynccontextmanager
async def app_lifespan(server: FastMCP) -> AsyncIterator[AppContext]:
"""Manage server startup and shutdown lifecycle."""
# Initialize on startup
db = await Database.connect()
cache = await CacheService.connect()
try:
yield AppContext(db=db, cache=cache, config={"timeout": 30})
finally:
# Cleanup on shutdown
await cache.disconnect()
await db.disconnect()
mcp = FastMCP("Production Server", lifespan=app_lifespan)
@mcp.tool()
def query_data(sql: str, ctx: Context) -> str:
"""Execute query using shared connection."""
app_ctx = ctx.request_context.lifespan_context
return app_ctx.db.query(sql)
```
## Auto-Enable Thresholds (CC 2.1.9)
Configure MCP servers to auto-enable/disable based on context window usage:
```yaml
# .claude/settings.json
mcp:
context7:
enabled: auto:75 # High-value docs, keep available longer
sequential-thinking:
enabled: auto:60 # Complex reasoning needs room
memory:
enabled: auto:90 # Knowledge graph - preserve until compaction
playwright:
enabled: auto:50 # Browser-heavy, disable early
```
**Threshold Guidelines:**
| Threshold | Use Case | Rationale |
|-----------|----------|-----------|
| auto:90 | Critical persistence | Keep until context nearly full |
| auto:75 | High-value reference | Preserve for complex tasks |
| auto:60 | Reasoning tools | Need headroom for output |
| auto:50 | Resource-intensive | Disable early to free context |
## Resource Management
```python
from functools import lru_cache
from datetime import datetime, timedelta
from typing import Any
class MCPResourceManager:
"""Manage MCP resources with caching and lifecycle."""
def __init__(self, cache_ttl: timedelta = timedelta(minutes=15)):
self.resources: dict[str, Any] = {}
self.cache_ttl = cache_ttl
self.last_access: dict[str, datetime] = {}
def get_resource(self, uri: str) -> Any:
"""Get resource with access time tracking."""
if uri in self.resources:
self.last_access[uri] = datetime.now()
return self.resources[uri]
resource = self._load_resource(uri)
self.resources[uri] = resource
self.last_access[uri] = datetime.now()
return resource
def cleanup_stale(self) -> int:
"""Remove stale resources. Returns count of removed."""
now = datetime.now()
stale = [
uri for uri, last in self.last_access.items()
if now - last > self.cache_ttl
]
for uri in stale:
del self.resources[uri]
del self.last_access[uri]
return len(stale)
```
## Horizontal Scaling
```python
import asyncio
from typing import List
class MCPLoadBalancer:
"""Load balance across multiple MCP server instances."""
def __init__(self, servers: List[str]):
self.servers = servers
self.current = 0
self.health: dict[str, bool] = {s: True for s in servers}
async def get_healthy_server(self) -> str:
"""Round-robin with health check."""
for _ in range(len(self.servers)):
server = self.servers[self.current]
self.current = (self.current + 1) % len(self.servers)
if self.health[server]:
return server
raise RuntimeError("No healthy servers available")
async def health_check_loop(self):
"""Periodic health check for all servers."""
while True:
for server in self.servers:
try:
self.health[server] = await self._ping(server)
except Exception:
self.health[server] = False
await asyncio.sleep(30)
```
## Key Decisions
| Decision | Recommendation |
|----------|----------------|
| Transport | Streamable HTTP for web, stdio for CLI |
| Lifecycle | Always use lifespan for resource management |
| Composition | Chain tools via pipeline pattern |
| Scaling | Health-checked round-robin for redundancy |
| Auto-enable | Use auto:N thresholds per server criticality |
## Common Mistakes
- No lifecycle management (resource leaks)
- Missing health checks in load balancing
- Hardcoded server endpoints
- No graceful degradation on server failure
- Ignoring context window thresholds
## Related Skills
- `function-calling` - LLM tool integration patterns
- `resilience-patterns` - Circuit breakers and retries
- `connection-pooling` - Database connection management
- `streaming-api-patterns` - Real-time streaming
## Capability Details
### tool-composition
**Keywords:** tool composition, pipeline, orchestration, chain tools
**Solves:**
- Combine multiple tools into workflows
- Sequential tool execution
- Tool result passing
### resource-management
**Keywords:** resource, cache, lifecycle, cleanup, ttl
**Solves:**
- Manage resource lifecycle
- Implement resource caching
- Clean up stale resources
### scaling-strategies
**Keywords:** scale, load balance, horizontal, health check, redundancy
**Solves:**
- Scale MCP servers horizontally
- Implement health-checked load balancing
- Handle server failures gracefully
### server-building
**Keywords:** server, fastmcp, lifespan, middleware, transport
**Solves:**
- Build production MCP servers
- Manage server lifecycle
- Configure transports and middleware
### auto-enable-thresholds
**Keywords:** auto-enable, context window, threshold, auto:N
**Solves:**
- Configure MCP auto-enable/disable
- Manage context window usage
- Optimize MCP server availability
This skill provides advanced Model Context Protocol (MCP) patterns for composing tools, managing resources, and scaling production MCP servers. It bundles production-ready patterns for building custom MCP servers, orchestrating tool pipelines, lifecycle-aware resource management, and horizontal scaling with health-checked load balancing. The patterns target TypeScript/Claude Code ecosystems and integrate with common web stacks and transports.
The skill describes modular patterns: a pipeline composition pattern to chain asynchronous tools, a lifespan-based server context to initialize and clean up shared resources, a resource manager that caches with TTL and removes stale entries, and a load balancer that round-robins healthy MCP instances with periodic health checks. It also defines auto-enable threshold guidelines to automatically enable or disable context categories as the context window fills. Example code sketches demonstrate the runtime shapes and lifecycle hooks to adopt.
How do auto-enable thresholds affect context preservation?
Auto-enable thresholds mark context categories to remain enabled until the specified percent of the window is used; higher values keep content longer, lower values free space earlier.
What health check frequency is recommended for load balancers?
A short loop (for example, 15–60 seconds) balances responsiveness and cost; choose based on failure impact and probe overhead.