home / skills / dimitrigilbert / ai-skills / openrouter

openrouter skill

/openrouter

This skill helps AI agents interact with OpenRouter's unified API to choose models, stream responses, call tools, and handle errors efficiently.

npx playbooks add skill dimitrigilbert/ai-skills --skill openrouter

Review the files below or copy the command above to add this skill to your agents.

Files (17)
SKILL.md
22.5 KB
---
name: openrouter
description: Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
---

# OpenRouter API for AI Agents

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

**When to use this skill:**
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance

---

## API Basics

### Making a Request

**Endpoint**: `POST https://openrouter.ai/api/v1/chat/completions`

**Headers** (required):
```typescript
{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}
```

**Minimal request structure**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});
```

### Response Structure

**Non-streaming response**:
```json
{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}
```

**Key fields**:
- `choices[0].message.content` - The assistant's response
- `choices[0].finish_reason` - Why generation stopped (stop, length, tool_calls, etc.)
- `usage` - Token counts and cost information
- `model` - Actual model used (may differ from requested)

### When to Use Streaming vs Non-Streaming

**Use streaming (`stream: true`)** when:
- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output

**Use non-streaming** when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)

**Streaming basics**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}
```

---

## Model Selection

### Model Identifier Format

**Format**: `provider/model-name[:variant]`

Examples:
- `anthropic/claude-3.5-sonnet` - Specific model
- `openai/gpt-4o:online` - With web search enabled
- `google/gemini-2.0-flash:free` - Free tier variant

### Model Variants and When to Use Them

| Variant | Use When | Tradeoffs |
|---------|----------|-----------|
| `:free` | Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
| `:online` | Need current information, real-time data | Higher cost, web search latency |
| `:extended` | Large context window needed | May be slower, higher cost |
| `:thinking` | Complex reasoning, multi-step problems | Higher token usage, slower |
| `:nitro` | Speed is critical | May have quality tradeoffs |
| `:exacto` | Need specific provider | No fallbacks, may be less available |

### Default Model Choices by Task

**General purpose**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Balanced quality, speed, cost
- Good for most tasks

**Coding**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Strong code generation and understanding
- Good reasoning

**Complex reasoning**: `anthropic/claude-opus-4:thinking` or `openai/o3`
- Deep reasoning capabilities
- Higher cost, slower

**Fast responses**: `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Minimal latency
- Good for real-time applications

**Cost-sensitive**: `google/gemini-2.0-flash:free` or `meta-llama/llama-3.1-70b:free`
- No cost with limits
- Good for high-volume, lower-complexity tasks

**Current information**: `anthropic/claude-3.5-sonnet:online` or `google/gemini-2.5-pro:online`
- Web search built-in
- Real-time data

**Large context**: `anthropic/claude-3.5-sonnet:extended` or `google/gemini-2.5-pro:extended`
- 200K+ context windows
- Document analysis, codebase understanding

### Provider Routing Preferences

**Default behavior**: OpenRouter automatically selects best provider

**Explicit provider order**:
```typescript
{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}
```

**When to set provider order**:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers

### Model Fallbacks

**Automatic fallback** - try multiple models in order:
```typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}
```

**When to use fallbacks**:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure

**Fallback behavior**:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in `model` field

---

## Parameters You Need

### Core Parameters

**model** (string, optional)
- Which model to use
- Default: user's default model
- **Always specify for consistency**

**messages** (Message[], required)
- Conversation history
- Structure: `{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }`
- For multimodal: content can be array of text and image_url parts

**stream** (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses

**temperature** (float, 0.0-2.0, default: 1.0)
- Controls randomness
- **0.0-0.3**: Deterministic, factual responses (code, precise answers)
- **0.4-0.7**: Balanced (general use)
- **0.8-1.2**: Creative (brainstorming, creative writing)
- **1.3-2.0**: Highly creative, unpredictable (experimental)

**max_tokens** (integer, optional)
- Maximum tokens to generate
- **Always set** to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length

**top_p** (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- **Use instead of temperature** when you want predictable diversity
- **0.9-0.95**: Common settings for quality

**top_k** (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- **1**: Always most likely (deterministic)
- **40-50**: Balanced
- Not available for OpenAI models

### Sampling Strategy Guidelines

**For code generation**: `temperature: 0.1-0.3, top_p: 0.95`
**For factual responses**: `temperature: 0.0-0.2`
**For creative writing**: `temperature: 0.8-1.2`
**For brainstorming**: `temperature: 1.0-1.5`
**For chat**: `temperature: 0.6-0.8`

### Tool Calling Parameters

**tools** (Tool[], default: [])
- Available functions for model to call
- Structure:
```typescript
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}
```

**tool_choice** (string | object, default: 'auto')
- Control when tools are called
- `'auto'`: Model decides (default)
- `'none'`: Never call tools
- `'required'`: Must call a tool
- `{ type: 'function', function: { name: 'specific_tool' } }`: Force specific tool

**parallel_tool_calls** (boolean, default: true)
- Allow multiple tools simultaneously
- Set `false` for sequential execution

**When to use tools**:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction

### Structured Output Parameters

**response_format** (object, optional)
- Enforce specific output format

**JSON object mode**:
```typescript
{ type: 'json_object' }
```
- Model returns valid JSON
- Must also instruct model in system message

**JSON Schema mode** (strict):
```typescript
{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
```
- Model returns JSON matching exact schema
- **Use when structure is critical** (APIs, data processing)

**When to use structured outputs**:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling

### Web Search Parameters

**Enable via model variant** (simplest):
```typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
```

**Enable via plugin**:
```typescript
{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}
```

**When to use web search**:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data

### Other Important Parameters

**user** (string, optional)
- Stable identifier for end-user
- **Set when you have user IDs**
- Helps with abuse detection and caching

**session_id** (string, optional)
- Group related requests
- **Set for conversation tracking**
- Improves caching and observability

**metadata** (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- **Use for analytics and tracking**
- Keys: max 64 chars, Values: max 512 chars

**stop** (string | string[], optional)
- Stop sequences to halt generation
- Common: `['\n\n', '###', 'END']`

---

## Handling Responses

### Non-Streaming Responses

Extract content:
```typescript
const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
```

Check for tool calls:
```typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}
```

### Streaming Responses

Process SSE stream:
```typescript
let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}
```

Handle streaming tool calls:
```typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}
```

### Usage and Cost Tracking

```typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);
```

---

## Error Handling

### Common HTTP Status Codes

**400 Bad Request**
- Invalid request format
- Missing required fields
- Parameter out of range
- **Fix**: Validate request structure and parameters

**401 Unauthorized**
- Missing or invalid API key
- **Fix**: Check API key format and permissions

**403 Forbidden**
- Insufficient permissions
- Model not allowed
- **Fix**: Check guardrails, model access, API key permissions

**402 Payment Required**
- Insufficient credits
- **Fix**: Add credits to account

**408 Request Timeout**
- Request took too long
- **Fix**: Reduce prompt length, use streaming, try simpler model

**429 Rate Limited**
- Too many requests
- **Fix**: Implement exponential backoff, reduce request rate

**502 Bad Gateway**
- Provider error
- **Fix**: Use model fallbacks, retry with different model

**503 Service Unavailable**
- Service overloaded
- **Fix**: Retry with backoff, use fallbacks

### Retry Strategy

**Exponential backoff**:
```typescript
async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}
```

**Retryable status codes**: 408, 429, 502, 503
**Do not retry**: 400, 401, 403, 402

### Graceful Degradation

**Use model fallbacks**:
```typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}
```

**Handle partial failures**:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely

---

## Advanced Features

### When to Use Tool Calling

**Good use cases**:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information

**Implementation pattern**:
1. Define tools with clear descriptions and parameters
2. Send request with `tools` array
3. Check if `tool_calls` present in response
4. Execute tools with parsed arguments
5. Send tool results back in a new request
6. Repeat until model provides final answer

**See**: `references/ADVANCED_PATTERNS.md` for complete agentic loop implementation

### When to Use Structured Outputs

**Good use cases**:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format

**Implementation pattern**:
1. Define JSON Schema for desired output
2. Set `response_format: { type: 'json_schema', json_schema: { ... } }`
3. Instruct model to produce JSON (system or user message)
4. Validate response against schema
5. Handle parsing errors gracefully

**Add response healing** for robustness:
```typescript
{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}
```

### When to Use Web Search

**Good use cases**:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information

**Simple implementation** (variant):
```typescript
{
  model: 'anthropic/claude-3.5-sonnet:online'
}
```

**Advanced implementation** (plugin):
```typescript
{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}
```

### When to Use Multimodal Inputs

**Images** (vision):
- OCR, image understanding, visual analysis
- Models: `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`, `google/gemini-2.5-pro`

**Audio**:
- Speech-to-text, audio analysis
- Models with audio support

**Video**:
- Video understanding, frame analysis
- Models with video support

**PDFs**:
- Document parsing, content extraction
- Requires `file-parser` plugin

**Implementation**: See `references/ADVANCED_PATTERNS.md` for multimodal patterns

---

## Best Practices for AI

### Default Model Selection

**Start with**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility

**Switch based on needs**:
- Need speed → `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Complex reasoning → `anthropic/claude-opus-4:thinking`
- Need web search → `:online` variant
- Large context → `:extended` variant
- Cost-sensitive → `:free` variant

### Default Parameters

```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}
```

**Adjust based on task**:
- Code: `temperature: 0.2`
- Creative: `temperature: 1.0`
- Factual: `temperature: 0.0-0.3`

### When to Prefer Streaming

**Always prefer streaming when**:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters

**Use non-streaming when**:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)

### When to Enable Specific Features

**Tools**: Enable when you need external data or actions
**Structured outputs**: Enable when response format matters
**Web search**: Enable when current information needed
**Streaming**: Enable for user-facing, real-time responses
**Model fallbacks**: Enable when reliability critical
**Provider routing**: Enable when you have preferences or constraints

### Cost Optimization Patterns

**Use free models for**:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations

**Use routing to optimize**:
```typescript
{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}
```

**Set max_tokens** to prevent runaway responses
**Use caching** via `user` and `session_id` parameters
**Enable prompt caching** when supported

### Performance Optimization

**Reduce latency**:
- Use `:nitro` variants for speed
- Use streaming for perceived speed
- Set `user` ID for caching benefits
- Choose faster models (mini, flash) when quality allows

**Increase throughput**:
- Use provider routing with `sort: 'throughput'`
- Parallelize independent requests
- Use streaming to reduce wait time

**Optimize for specific metrics**:
```typescript
{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}
```

---

## Progressive Disclosure

For detailed reference information, consult:

### Parameters Reference
**File**: `references/PARAMETERS.md`
- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples

### Error Codes Reference
**File**: `references/ERROR_CODES.md`
- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies

### Model Selection Guide
**File**: `references/MODEL_SELECTION.md`
- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences

### Routing Strategies
**File**: `references/ROUTING_STRATEGIES.md`
- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)

### Advanced Patterns
**File**: `references/ADVANCED_PATTERNS.md`
- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations

### Working Examples
**File**: `references/EXAMPLES.md`
- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples

### Ready-to-Use Templates
**Directory**: `templates/`
- `basic-request.ts` - Minimal working request
- `streaming-request.ts` - SSE streaming with cancellation
- `tool-calling.ts` - Complete agentic loop with tools
- `structured-output.ts` - JSON Schema enforcement
- `error-handling.ts` - Robust retry logic

---

## Quick Reference

### Minimal Request
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}
```

### With Streaming
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}
```

### With Tools
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}
```

### With Structured Output
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}
```

### With Web Search
```typescript
{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}
```

### With Model Fallbacks
```typescript
{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}
```

---

**Remember**: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with `baseURL: 'https://openrouter.ai/api/v1'` for a familiar experience.

Overview

This skill is an expert OpenRouter API assistant for AI agents, providing practical guidance for integrating with OpenRouter's unified access to 400+ models. It focuses on chat completions, streaming, tool calling, structured outputs, multimodal inputs, model selection, routing, and robust error handling. Use it to build reliable, cost-effective agent integrations that require provider flexibility and advanced features.

How this skill works

The skill describes the core request and response patterns for OpenRouter's chat/completions endpoint, including required headers and minimal request structure. It explains streaming (SSE) handling, non-streaming parsing, tool-calling mechanics, structured output modes (JSON and JSON Schema), model selection and variant usage, provider routing, and error/retry strategies. Practical code snippets and parameter recommendations guide implementation choices and parameter tuning.

When to use it

  • Making chat completions or agent-driven conversations across many providers
  • Implementing streaming real-time responses for chat UIs or terminals
  • Enabling tool/function calling and executing model-driven workflows
  • Enforcing structured outputs for APIs, databases, or automated pipelines
  • Routing across providers and implementing model fallbacks for reliability
  • Handling multimodal inputs (images, audio, video, PDFs) and embeddings

Best practices

  • Always specify model and set max_tokens to control cost and avoid runaway output
  • Use streaming for low-latency user-facing UIs and non-streaming for batch or synchronous processing
  • Set temperature/top_p/top_k according to task (low for code/factual, higher for creative tasks)
  • Provide explicit provider order and fallbacks when reliability or cost guarantees matter
  • Wrap requests with exponential backoff and graceful handling for 429/5xx errors
  • Use response_format/json_schema for strict, machine-parseable outputs when integrating downstream systems

Example use cases

  • A chat application that streams assistant tokens to the client with progressive UI updates
  • An agent that calls external tools (search, DB, APIs) via model-invoked tool calls and executes returned arguments
  • Batch document analysis using large-context variants to process and summarize multi-file uploads
  • Cost-optimized routing that prefers free or low-cost variants with automatic fallbacks to paid providers on failure
  • A data extraction pipeline that requires strict JSON Schema output for downstream ingestion

FAQ

When should I use streaming versus non-streaming?

Use streaming for interactive, low-latency experiences and long outputs; use non-streaming for background jobs, short responses, or when you need the complete output before processing.

How do I force a specific tool or prevent tool calls?

Set tool_choice to a specific function object to force a tool, 'none' to disable, 'required' to mandate a tool, and keep 'auto' to let the model decide.