home / skills / phrazzld / claude-config / llm-gateway-routing
This skill helps configure multi-model access, routing, fallbacks, and ab testing for LLMs using OpenRouter and LiteLLM to improve reliability and cost.
npx playbooks add skill phrazzld/claude-config --skill llm-gateway-routingReview the files below or copy the command above to add this skill to your agents.
---
name: llm-gateway-routing
description: |
LLM gateway and routing configuration using OpenRouter and LiteLLM.
Invoke when:
- Setting up multi-model access (OpenRouter, LiteLLM)
- Configuring model fallbacks and reliability
- Implementing cost-based or latency-based routing
- A/B testing different models
- Self-hosting an LLM proxy
Keywords: openrouter, litellm, llm gateway, model routing, fallback, A/B testing
effort: high
---
# LLM Gateway & Routing
Configure multi-model access, fallbacks, cost optimization, and A/B testing.
## Why Use a Gateway?
**Without gateway:**
- Vendor lock-in (one provider)
- No fallbacks (provider down = app down)
- Hard to A/B test models
- Scattered API keys and configs
**With gateway:**
- Single API for 400+ models
- Automatic fallbacks
- Easy model switching
- Unified cost tracking
## Quick Decision
| Need | Solution |
|------|----------|
| Fastest setup, multi-model | **OpenRouter** |
| Full control, self-hosted | **LiteLLM** |
| Observability + routing | **Helicone** |
| Enterprise, guardrails | **Portkey** |
## OpenRouter (Recommended)
### Why OpenRouter
- **400+ models**: OpenAI, Anthropic, Google, Meta, Mistral, and more
- **Single API**: One key for all providers
- **Automatic fallbacks**: Built-in reliability
- **A/B testing**: Easy model comparison
- **Cost tracking**: Unified billing dashboard
- **Free credits**: $1 free to start
### Setup
```bash
# 1. Sign up at openrouter.ai
# 2. Get API key from dashboard
# 3. Add to .env:
OPENROUTER_API_KEY=sk-or-v1-...
```
### Basic Usage
```typescript
// Using fetch
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});
```
### With Vercel AI SDK (Recommended)
```typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const openrouter = createOpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const { text } = await generateText({
model: openrouter("anthropic/claude-3-5-sonnet"),
prompt: "Explain quantum computing",
});
```
### Model IDs
```typescript
// Format: provider/model-name
const models = {
// Anthropic
claude35Sonnet: "anthropic/claude-3-5-sonnet",
claudeHaiku: "anthropic/claude-3-5-haiku",
// OpenAI
gpt4o: "openai/gpt-4o",
gpt4oMini: "openai/gpt-4o-mini",
// Google
geminiPro: "google/gemini-pro-1.5",
geminiFlash: "google/gemini-flash-1.5",
// Meta
llama3: "meta-llama/llama-3.1-70b-instruct",
// Auto (OpenRouter picks best)
auto: "openrouter/auto",
};
```
### Fallback Chains
```typescript
// Define fallback order
const modelChain = [
"anthropic/claude-3-5-sonnet", // Primary
"openai/gpt-4o", // Fallback 1
"google/gemini-pro-1.5", // Fallback 2
];
async function callWithFallback(messages: Message[]) {
for (const model of modelChain) {
try {
return await openrouter.chat({ model, messages });
} catch (error) {
console.log(`${model} failed, trying next...`);
}
}
throw new Error("All models failed");
}
```
### Cost Routing
```typescript
// Route based on query complexity
function selectModel(query: string): string {
const complexity = analyzeComplexity(query);
if (complexity === "simple") {
// Simple queries → cheap model
return "openai/gpt-4o-mini"; // ~$0.15/1M tokens
} else if (complexity === "medium") {
// Medium → balanced
return "google/gemini-flash-1.5"; // ~$0.075/1M tokens
} else {
// Complex → best quality
return "anthropic/claude-3-5-sonnet"; // ~$3/1M tokens
}
}
function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
// Simple heuristics
if (query.length < 50) return "simple";
if (query.includes("explain") || query.includes("analyze")) return "complex";
return "medium";
}
```
### A/B Testing
```typescript
// Random assignment
function getModel(userId: string): string {
const hash = userId.charCodeAt(0) % 100;
if (hash < 50) {
return "anthropic/claude-3-5-sonnet"; // 50%
} else {
return "openai/gpt-4o"; // 50%
}
}
// Track which model was used
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });
```
## LiteLLM (Self-Hosted)
### Why LiteLLM
- **Self-hosted**: Full control over data
- **100+ providers**: Same coverage as OpenRouter
- **Load balancing**: Distribute across providers
- **Cost tracking**: Built-in spend management
- **Caching**: Redis or in-memory
- **Rate limiting**: Per-user limits
### Setup
```bash
# Install
pip install litellm[proxy]
# Run proxy
litellm --config config.yaml
# Use as OpenAI-compatible endpoint
export OPENAI_API_BASE=http://localhost:4000
```
### Configuration
```yaml
# config.yaml
model_list:
# Claude models
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: sk-ant-...
# OpenAI models
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-...
# Load balanced (multiple providers)
- model_name: balanced
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
litellm_params:
model: openai/gpt-4o
# Requests distributed across both
# General settings
general_settings:
master_key: sk-master-...
database_url: postgresql://...
# Routing
router_settings:
routing_strategy: simple-shuffle # or latency-based-routing
num_retries: 3
timeout: 30
# Rate limiting
litellm_settings:
max_budget: 100 # $100/month
budget_duration: monthly
```
### Fallbacks in LiteLLM
```yaml
model_list:
- model_name: primary
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
fallbacks:
- model_name: fallback-1
litellm_params:
model: openai/gpt-4o
- model_name: fallback-2
litellm_params:
model: google/gemini-pro
```
### Usage
```typescript
// Use like OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: "sk-master-...",
});
const response = await client.chat.completions.create({
model: "claude-sonnet", // Maps to configured model
messages: [{ role: "user", content: "Hello!" }],
});
```
## Routing Strategies
### 1. Cost-Based Routing
```typescript
const costTiers = {
cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};
function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
const models = costTiers[budget];
return models[Math.floor(Math.random() * models.length)];
}
```
### 2. Latency-Based Routing
```typescript
// Track latency per model
const latencyStats: Record<string, number[]> = {};
function routeByLatency(): string {
const avgLatencies = Object.entries(latencyStats)
.map(([model, times]) => ({
model,
avg: times.reduce((a, b) => a + b, 0) / times.length,
}))
.sort((a, b) => a.avg - b.avg);
return avgLatencies[0].model;
}
// Update after each call
function recordLatency(model: string, latencyMs: number) {
if (!latencyStats[model]) latencyStats[model] = [];
latencyStats[model].push(latencyMs);
// Keep last 100 samples
if (latencyStats[model].length > 100) {
latencyStats[model].shift();
}
}
```
### 3. Task-Based Routing
```typescript
const taskModels = {
coding: "anthropic/claude-3-5-sonnet", // Best for code
reasoning: "openai/o1-preview", // Best for logic
creative: "anthropic/claude-3-5-sonnet", // Best for writing
simple: "openai/gpt-4o-mini", // Cheap and fast
multimodal: "google/gemini-pro-1.5", // Vision + text
};
function routeByTask(task: keyof typeof taskModels): string {
return taskModels[task];
}
```
### 4. Hybrid Routing
```typescript
interface RoutingConfig {
task: string;
maxCost: number;
maxLatency: number;
}
function hybridRoute(config: RoutingConfig): string {
// Filter by cost
const affordable = models.filter(m => m.cost <= config.maxCost);
// Filter by latency
const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);
// Select best for task
const taskScores = fast.map(m => ({
model: m.id,
score: getTaskScore(m.id, config.task),
}));
return taskScores.sort((a, b) => b.score - a.score)[0].model;
}
```
## Best Practices
### 1. Always Have Fallbacks
```typescript
// Bad: Single point of failure
const response = await openai.chat({ model: "gpt-4o", messages });
// Good: Fallback chain
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
try {
return await gateway.chat({ model, messages });
} catch (e) {
continue;
}
}
```
### 2. Pin Model Versions
```typescript
// Bad: Model can change
const model = "gpt-4";
// Good: Pinned version
const model = "openai/gpt-4-0125-preview";
```
### 3. Track Costs
```typescript
// Log every call
async function trackedCall(model: string, messages: Message[]) {
const start = Date.now();
const response = await gateway.chat({ model, messages });
const latency = Date.now() - start;
await analytics.track("llm_call", {
model,
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cost: calculateCost(model, response.usage),
latency,
});
return response;
}
```
### 4. Set Token Limits
```typescript
// Prevent runaway costs
const response = await gateway.chat({
model,
messages,
max_tokens: 500, // Limit output length
});
```
### 5. Use Caching
```typescript
// LiteLLM caching
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600 # 1 hour
```
## References
- `references/openrouter-guide.md` - OpenRouter deep dive
- `references/litellm-guide.md` - LiteLLM self-hosting
- `references/routing-strategies.md` - Advanced routing patterns
- `references/alternatives.md` - Helicone, Portkey, etc.
## Templates
- `templates/openrouter-config.ts` - TypeScript OpenRouter setup
- `templates/litellm-config.yaml` - LiteLLM proxy config
- `templates/fallback-chain.ts` - Fallback implementation
This skill configures an LLM gateway and model routing layer using OpenRouter and LiteLLM. It helps you unify multi-model access, implement fallbacks, cost- or latency-based routing, A/B testing, and self-hosted proxy capabilities. The goal is reliable, observable, and cost-aware LLM access for production applications.
The skill wires a single gateway API in front of many models (OpenAI, Anthropic, Google, Meta, etc.) and implements routing policies that select a model per request. It supports OpenRouter for fast multi-provider setup and LiteLLM for self-hosted control, including fallback chains, load balancing, caching, rate limits, and analytics hooks. Routing strategies include cost-based, latency-based, task-based, and hybrid decision logic.
When should I pick OpenRouter vs LiteLLM?
Choose OpenRouter for fastest multi-provider setup and centralized billing. Choose LiteLLM when you need self-hosting, full data control, or custom deployment and routing logic.
How do I ensure calls stay within budget?
Implement cost-based routing, set token limits, track per-call costs in analytics, and enforce monthly budgets or quotas at the gateway level.