home / skills / dimitrigilbert / ai-skills / openrouter
This skill helps AI agents interact with OpenRouter's unified API to choose models, stream responses, call tools, and handle errors efficiently.
npx playbooks add skill dimitrigilbert/ai-skills --skill openrouterReview the files below or copy the command above to add this skill to your agents.
---
name: openrouter
description: Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
---
# OpenRouter API for AI Agents
Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.
**When to use this skill:**
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance
---
## API Basics
### Making a Request
**Endpoint**: `POST https://openrouter.ai/api/v1/chat/completions`
**Headers** (required):
```typescript
{
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
// Optional: for app attribution
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App Name'
}
```
**Minimal request structure**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Your prompt here' }
]
})
});
```
### Response Structure
**Non-streaming response**:
```json
{
"id": "gen-abc123",
"choices": [{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"model": "anthropic/claude-3.5-sonnet"
}
```
**Key fields**:
- `choices[0].message.content` - The assistant's response
- `choices[0].finish_reason` - Why generation stopped (stop, length, tool_calls, etc.)
- `usage` - Token counts and cost information
- `model` - Actual model used (may differ from requested)
### When to Use Streaming vs Non-Streaming
**Use streaming (`stream: true`)** when:
- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output
**Use non-streaming** when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)
**Streaming basics**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
})
});
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // Remove 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
// Accumulate or display content
}
}
}
```
---
## Model Selection
### Model Identifier Format
**Format**: `provider/model-name[:variant]`
Examples:
- `anthropic/claude-3.5-sonnet` - Specific model
- `openai/gpt-4o:online` - With web search enabled
- `google/gemini-2.0-flash:free` - Free tier variant
### Model Variants and When to Use Them
| Variant | Use When | Tradeoffs |
|---------|----------|-----------|
| `:free` | Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
| `:online` | Need current information, real-time data | Higher cost, web search latency |
| `:extended` | Large context window needed | May be slower, higher cost |
| `:thinking` | Complex reasoning, multi-step problems | Higher token usage, slower |
| `:nitro` | Speed is critical | May have quality tradeoffs |
| `:exacto` | Need specific provider | No fallbacks, may be less available |
### Default Model Choices by Task
**General purpose**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Balanced quality, speed, cost
- Good for most tasks
**Coding**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Strong code generation and understanding
- Good reasoning
**Complex reasoning**: `anthropic/claude-opus-4:thinking` or `openai/o3`
- Deep reasoning capabilities
- Higher cost, slower
**Fast responses**: `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Minimal latency
- Good for real-time applications
**Cost-sensitive**: `google/gemini-2.0-flash:free` or `meta-llama/llama-3.1-70b:free`
- No cost with limits
- Good for high-volume, lower-complexity tasks
**Current information**: `anthropic/claude-3.5-sonnet:online` or `google/gemini-2.5-pro:online`
- Web search built-in
- Real-time data
**Large context**: `anthropic/claude-3.5-sonnet:extended` or `google/gemini-2.5-pro:extended`
- 200K+ context windows
- Document analysis, codebase understanding
### Provider Routing Preferences
**Default behavior**: OpenRouter automatically selects best provider
**Explicit provider order**:
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // 'price', 'latency', or 'throughput'
}
}
```
**When to set provider order**:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers
### Model Fallbacks
**Automatic fallback** - try multiple models in order:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}
```
**When to use fallbacks**:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure
**Fallback behavior**:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in `model` field
---
## Parameters You Need
### Core Parameters
**model** (string, optional)
- Which model to use
- Default: user's default model
- **Always specify for consistency**
**messages** (Message[], required)
- Conversation history
- Structure: `{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }`
- For multimodal: content can be array of text and image_url parts
**stream** (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses
**temperature** (float, 0.0-2.0, default: 1.0)
- Controls randomness
- **0.0-0.3**: Deterministic, factual responses (code, precise answers)
- **0.4-0.7**: Balanced (general use)
- **0.8-1.2**: Creative (brainstorming, creative writing)
- **1.3-2.0**: Highly creative, unpredictable (experimental)
**max_tokens** (integer, optional)
- Maximum tokens to generate
- **Always set** to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length
**top_p** (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- **Use instead of temperature** when you want predictable diversity
- **0.9-0.95**: Common settings for quality
**top_k** (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- **1**: Always most likely (deterministic)
- **40-50**: Balanced
- Not available for OpenAI models
### Sampling Strategy Guidelines
**For code generation**: `temperature: 0.1-0.3, top_p: 0.95`
**For factual responses**: `temperature: 0.0-0.2`
**For creative writing**: `temperature: 0.8-1.2`
**For brainstorming**: `temperature: 1.0-1.5`
**For chat**: `temperature: 0.6-0.8`
### Tool Calling Parameters
**tools** (Tool[], default: [])
- Available functions for model to call
- Structure:
```typescript
{
type: 'function',
function: {
name: 'function_name',
description: 'What it does',
parameters: { /* JSON Schema */ }
}
}
```
**tool_choice** (string | object, default: 'auto')
- Control when tools are called
- `'auto'`: Model decides (default)
- `'none'`: Never call tools
- `'required'`: Must call a tool
- `{ type: 'function', function: { name: 'specific_tool' } }`: Force specific tool
**parallel_tool_calls** (boolean, default: true)
- Allow multiple tools simultaneously
- Set `false` for sequential execution
**When to use tools**:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction
### Structured Output Parameters
**response_format** (object, optional)
- Enforce specific output format
**JSON object mode**:
```typescript
{ type: 'json_object' }
```
- Model returns valid JSON
- Must also instruct model in system message
**JSON Schema mode** (strict):
```typescript
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}
```
- Model returns JSON matching exact schema
- **Use when structure is critical** (APIs, data processing)
**When to use structured outputs**:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling
### Web Search Parameters
**Enable via model variant** (simplest):
```typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
```
**Enable via plugin**:
```typescript
{
plugins: [{
id: 'web',
enabled: true,
max_results: 5
}]
}
```
**When to use web search**:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data
### Other Important Parameters
**user** (string, optional)
- Stable identifier for end-user
- **Set when you have user IDs**
- Helps with abuse detection and caching
**session_id** (string, optional)
- Group related requests
- **Set for conversation tracking**
- Improves caching and observability
**metadata** (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- **Use for analytics and tracking**
- Keys: max 64 chars, Values: max 512 chars
**stop** (string | string[], optional)
- Stop sequences to halt generation
- Common: `['\n\n', '###', 'END']`
---
## Handling Responses
### Non-Streaming Responses
Extract content:
```typescript
const response = await fetch(/* ... */);
const data = await response.json();
const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
```
Check for tool calls:
```typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
// Model wants to call tools
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
// Execute tool...
}
}
```
### Streaming Responses
Process SSE stream:
```typescript
let fullContent = '';
const response = await fetch(/* ... */);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// Process incrementally...
}
// Handle usage in final chunk
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}
```
Handle streaming tool calls:
```typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';
for (const parsed of chunks) {
const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCallChunk?.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
}
if (toolCallChunk?.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
// Complete tool call
currentToolCall.arguments = toolArgs;
// Execute tool...
}
}
```
### Usage and Cost Tracking
```typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);
// Cost (if available)
if (usage.cost) {
console.log(`Cost: $${usage.cost.toFixed(6)}`);
}
// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);
```
---
## Error Handling
### Common HTTP Status Codes
**400 Bad Request**
- Invalid request format
- Missing required fields
- Parameter out of range
- **Fix**: Validate request structure and parameters
**401 Unauthorized**
- Missing or invalid API key
- **Fix**: Check API key format and permissions
**403 Forbidden**
- Insufficient permissions
- Model not allowed
- **Fix**: Check guardrails, model access, API key permissions
**402 Payment Required**
- Insufficient credits
- **Fix**: Add credits to account
**408 Request Timeout**
- Request took too long
- **Fix**: Reduce prompt length, use streaming, try simpler model
**429 Rate Limited**
- Too many requests
- **Fix**: Implement exponential backoff, reduce request rate
**502 Bad Gateway**
- Provider error
- **Fix**: Use model fallbacks, retry with different model
**503 Service Unavailable**
- Service overloaded
- **Fix**: Retry with backoff, use fallbacks
### Retry Strategy
**Exponential backoff**:
```typescript
async function requestWithRetry(url, body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, body);
if (response.ok) {
return await response.json();
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// Don't retry other errors
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
```
**Retryable status codes**: 408, 429, 502, 503
**Do not retry**: 400, 401, 403, 402
### Graceful Degradation
**Use model fallbacks**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}
```
**Handle partial failures**:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely
---
## Advanced Features
### When to Use Tool Calling
**Good use cases**:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information
**Implementation pattern**:
1. Define tools with clear descriptions and parameters
2. Send request with `tools` array
3. Check if `tool_calls` present in response
4. Execute tools with parsed arguments
5. Send tool results back in a new request
6. Repeat until model provides final answer
**See**: `references/ADVANCED_PATTERNS.md` for complete agentic loop implementation
### When to Use Structured Outputs
**Good use cases**:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format
**Implementation pattern**:
1. Define JSON Schema for desired output
2. Set `response_format: { type: 'json_schema', json_schema: { ... } }`
3. Instruct model to produce JSON (system or user message)
4. Validate response against schema
5. Handle parsing errors gracefully
**Add response healing** for robustness:
```typescript
{
response_format: { /* ... */ },
plugins: [{ id: 'response-healing' }]
}
```
### When to Use Web Search
**Good use cases**:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information
**Simple implementation** (variant):
```typescript
{
model: 'anthropic/claude-3.5-sonnet:online'
}
```
**Advanced implementation** (plugin):
```typescript
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // or 'native'
}]
}
```
### When to Use Multimodal Inputs
**Images** (vision):
- OCR, image understanding, visual analysis
- Models: `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`, `google/gemini-2.5-pro`
**Audio**:
- Speech-to-text, audio analysis
- Models with audio support
**Video**:
- Video understanding, frame analysis
- Models with video support
**PDFs**:
- Document parsing, content extraction
- Requires `file-parser` plugin
**Implementation**: See `references/ADVANCED_PATTERNS.md` for multimodal patterns
---
## Best Practices for AI
### Default Model Selection
**Start with**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility
**Switch based on needs**:
- Need speed → `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Complex reasoning → `anthropic/claude-opus-4:thinking`
- Need web search → `:online` variant
- Large context → `:extended` variant
- Cost-sensitive → `:free` variant
### Default Parameters
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
temperature: 0.6, // Balanced creativity
max_tokens: 1000, // Reasonable length
top_p: 0.95 // Common for quality
}
```
**Adjust based on task**:
- Code: `temperature: 0.2`
- Creative: `temperature: 1.0`
- Factual: `temperature: 0.0-0.3`
### When to Prefer Streaming
**Always prefer streaming when**:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters
**Use non-streaming when**:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)
### When to Enable Specific Features
**Tools**: Enable when you need external data or actions
**Structured outputs**: Enable when response format matters
**Web search**: Enable when current information needed
**Streaming**: Enable for user-facing, real-time responses
**Model fallbacks**: Enable when reliability critical
**Provider routing**: Enable when you have preferences or constraints
### Cost Optimization Patterns
**Use free models for**:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations
**Use routing to optimize**:
```typescript
{
provider: {
order: ['openai', 'anthropic'],
sort: 'price', // Optimize for cost
allow_fallbacks: true
}
}
```
**Set max_tokens** to prevent runaway responses
**Use caching** via `user` and `session_id` parameters
**Enable prompt caching** when supported
### Performance Optimization
**Reduce latency**:
- Use `:nitro` variants for speed
- Use streaming for perceived speed
- Set `user` ID for caching benefits
- Choose faster models (mini, flash) when quality allows
**Increase throughput**:
- Use provider routing with `sort: 'throughput'`
- Parallelize independent requests
- Use streaming to reduce wait time
**Optimize for specific metrics**:
```typescript
{
provider: {
sort: 'latency' // or 'price' or 'throughput'
}
}
```
---
## Progressive Disclosure
For detailed reference information, consult:
### Parameters Reference
**File**: `references/PARAMETERS.md`
- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples
### Error Codes Reference
**File**: `references/ERROR_CODES.md`
- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies
### Model Selection Guide
**File**: `references/MODEL_SELECTION.md`
- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences
### Routing Strategies
**File**: `references/ROUTING_STRATEGIES.md`
- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)
### Advanced Patterns
**File**: `references/ADVANCED_PATTERNS.md`
- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations
### Working Examples
**File**: `references/EXAMPLES.md`
- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples
### Ready-to-Use Templates
**Directory**: `templates/`
- `basic-request.ts` - Minimal working request
- `streaming-request.ts` - SSE streaming with cancellation
- `tool-calling.ts` - Complete agentic loop with tools
- `structured-output.ts` - JSON Schema enforcement
- `error-handling.ts` - Robust retry logic
---
## Quick Reference
### Minimal Request
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Your prompt' }]
}
```
### With Streaming
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}
```
### With Tools
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
tools: [{ type: 'function', function: { name, description, parameters } }],
tool_choice: 'auto'
}
```
### With Structured Output
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'system', content: 'Output JSON only...' }],
response_format: { type: 'json_object' }
}
```
### With Web Search
```typescript
{
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{ role: 'user', content: '...' }]
}
```
### With Model Fallbacks
```typescript
{
models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
messages: [{ role: 'user', content: '...' }]
}
```
---
**Remember**: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with `baseURL: 'https://openrouter.ai/api/v1'` for a familiar experience.
This skill is an expert OpenRouter API assistant for AI agents, providing practical guidance for integrating with OpenRouter's unified access to 400+ models. It focuses on chat completions, streaming, tool calling, structured outputs, multimodal inputs, model selection, routing, and robust error handling. Use it to build reliable, cost-effective agent integrations that require provider flexibility and advanced features.
The skill describes the core request and response patterns for OpenRouter's chat/completions endpoint, including required headers and minimal request structure. It explains streaming (SSE) handling, non-streaming parsing, tool-calling mechanics, structured output modes (JSON and JSON Schema), model selection and variant usage, provider routing, and error/retry strategies. Practical code snippets and parameter recommendations guide implementation choices and parameter tuning.
When should I use streaming versus non-streaming?
Use streaming for interactive, low-latency experiences and long outputs; use non-streaming for background jobs, short responses, or when you need the complete output before processing.
How do I force a specific tool or prevent tool calls?
Set tool_choice to a specific function object to force a tool, 'none' to disable, 'required' to mandate a tool, and keep 'auto' to let the model decide.