home / skills / dimitrigilbert / ai-skills / openrouter

openrouter skill

unsafe

This skill helps AI agents interact with OpenRouter's unified API to choose models, stream responses, call tools, and handle errors efficiently.

npx playbooks add skill dimitrigilbert/ai-skills --skill openrouter

Review the files below or copy the command above to add this skill to your agents.

Files (17)

SKILL.md

22.5 KB

---
name: openrouter
description: Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
---

# OpenRouter API for AI Agents

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

**When to use this skill:**
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance

---

## API Basics

### Making a Request

**Endpoint**: `POST https://openrouter.ai/api/v1/chat/completions`

**Headers** (required):
```typescript
{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}
```

**Minimal request structure**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});
```

### Response Structure

**Non-streaming response**:
```json
{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}
```

**Key fields**:
- `choices[0].message.content` - The assistant's response
- `choices[0].finish_reason` - Why generation stopped (stop, length, tool_calls, etc.)
- `usage` - Token counts and cost information
- `model` - Actual model used (may differ from requested)

### When to Use Streaming vs Non-Streaming

**Use streaming (`stream: true`)** when:
- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output

**Use non-streaming** when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)

**Streaming basics**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}
```

---

## Model Selection

### Model Identifier Format

**Format**: `provider/model-name[:variant]`

Examples:
- `anthropic/claude-3.5-sonnet` - Specific model
- `openai/gpt-4o:online` - With web search enabled
- `google/gemini-2.0-flash:free` - Free tier variant

### Model Variants and When to Use Them

| Variant | Use When | Tradeoffs |
|---------|----------|-----------|
| `:free` | Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
| `:online` | Need current information, real-time data | Higher cost, web search latency |
| `:extended` | Large context window needed | May be slower, higher cost |
| `:thinking` | Complex reasoning, multi-step problems | Higher token usage, slower |
| `:nitro` | Speed is critical | May have quality tradeoffs |
| `:exacto` | Need specific provider | No fallbacks, may be less available |

### Default Model Choices by Task

**General purpose**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Balanced quality, speed, cost
- Good for most tasks

**Coding**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Strong code generation and understanding
- Good reasoning

**Complex reasoning**: `anthropic/claude-opus-4:thinking` or `openai/o3`
- Deep reasoning capabilities
- Higher cost, slower

**Fast responses**: `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Minimal latency
- Good for real-time applications

**Cost-sensitive**: `google/gemini-2.0-flash:free` or `meta-llama/llama-3.1-70b:free`
- No cost with limits
- Good for high-volume, lower-complexity tasks

**Current information**: `anthropic/claude-3.5-sonnet:online` or `google/gemini-2.5-pro:online`
- Web search built-in
- Real-time data

**Large context**: `anthropic/claude-3.5-sonnet:extended` or `google/gemini-2.5-pro:extended`
- 200K+ context windows
- Document analysis, codebase understanding

### Provider Routing Preferences

**Default behavior**: OpenRouter automatically selects best provider

**Explicit provider order**:
```typescript
{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}
```

**When to set provider order**:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers

### Model Fallbacks

**Automatic fallback** - try multiple models in order:
```typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}
```

**When to use fallbacks**:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure

**Fallback behavior**:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in `model` field

---

## Parameters You Need

### Core Parameters

**model** (string, optional)
- Which model to use
- Default: user's default model
- **Always specify for consistency**

**messages** (Message[], required)
- Conversation history
- Structure: `{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }`
- For multimodal: content can be array of text and image_url parts

**stream** (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses

**temperature** (float, 0.0-2.0, default: 1.0)
- Controls randomness
- **0.0-0.3**: Deterministic, factual responses (code, precise answers)
- **0.4-0.7**: Balanced (general use)
- **0.8-1.2**: Creative (brainstorming, creative writing)
- **1.3-2.0**: Highly creative, unpredictable (experimental)

**max_tokens** (integer, optional)
- Maximum tokens to generate
- **Always set** to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length

**top_p** (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- **Use instead of temperature** when you want predictable diversity
- **0.9-0.95**: Common settings for quality

**top_k** (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- **1**: Always most likely (deterministic)
- **40-50**: Balanced
- Not available for OpenAI models

### Sampling Strategy Guidelines

**For code generation**: `temperature: 0.1-0.3, top_p: 0.95`
**For factual responses**: `temperature: 0.0-0.2`
**For creative writing**: `temperature: 0.8-1.2`
**For brainstorming**: `temperature: 1.0-1.5`
**For chat**: `temperature: 0.6-0.8`

### Tool Calling Parameters

**tools** (Tool[], default: [])
- Available functions for model to call
- Structure:
```typescript
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}
```

**tool_choice** (string | object, default: 'auto')
- Control when tools are called
- `'auto'`: Model decides (default)
- `'none'`: Never call tools
- `'required'`: Must call a tool
- `{ type: 'function', function: { name: 'specific_tool' } }`: Force specific tool

**parallel_tool_calls** (boolean, default: true)
- Allow multiple tools simultaneously
- Set `false` for sequential execution

**When to use tools**:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction

### Structured Output Parameters

**response_format** (object, optional)
- Enforce specific output format

**JSON object mode**:
```typescript
{ type: 'json_object' }
```
- Model returns valid JSON
- Must also instruct model in system message

**JSON Schema mode** (strict):
```typescript
{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
```
- Model returns JSON matching exact schema
- **Use when structure is critical** (APIs, data processing)

**When to use structured outputs**:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling

### Web Search Parameters

**Enable via model variant** (simplest):
```typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
```

**Enable via plugin**:
```typescript
{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}
```

**When to use web search**:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data

### Other Important Parameters

**user** (string, optional)
- Stable identifier for end-user
- **Set when you have user IDs**
- Helps with abuse detection and caching

**session_id** (string, optional)
- Group related requests
- **Set for conversation tracking**
- Improves caching and observability

**metadata** (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- **Use for analytics and tracking**
- Keys: max 64 chars, Values: max 512 chars

**stop** (string | string[], optional)
- Stop sequences to halt generation
- Common: `['\n\n', '###', 'END']`

---

## Handling Responses

### Non-Streaming Responses

Extract content:
```typescript
const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
```

Check for tool calls:
```typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}
```

### Streaming Responses

Process SSE stream:
```typescript
let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}
```

Handle streaming tool calls:
```typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}
```

### Usage and Cost Tracking

```typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);
```

---

## Error Handling

### Common HTTP Status Codes

**400 Bad Request**
- Invalid request format
- Missing required fields
- Parameter out of range
- **Fix**: Validate request structure and parameters

**401 Unauthorized**
- Missing or invalid API key
- **Fix**: Check API key format and permissions

**403 Forbidden**
- Insufficient permissions
- Model not allowed
- **Fix**: Check guardrails, model access, API key permissions

**402 Payment Required**
- Insufficient credits
- **Fix**: Add credits to account

**408 Request Timeout**
- Request took too long
- **Fix**: Reduce prompt length, use streaming, try simpler model

**429 Rate Limited**
- Too many requests
- **Fix**: Implement exponential backoff, reduce request rate

**502 Bad Gateway**
- Provider error
- **Fix**: Use model fallbacks, retry with different model

**503 Service Unavailable**
- Service overloaded
- **Fix**: Retry with backoff, use fallbacks

### Retry Strategy

**Exponential backoff**:
```typescript
async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}
```

**Retryable status codes**: 408, 429, 502, 503
**Do not retry**: 400, 401, 403, 402

### Graceful Degradation

**Use model fallbacks**:
```typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}
```

**Handle partial failures**:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely

---

## Advanced Features

### When to Use Tool Calling

**Good use cases**:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information

**Implementation pattern**:
1. Define tools with clear descriptions and parameters
2. Send request with `tools` array
3. Check if `tool_calls` present in response
4. Execute tools with parsed arguments
5. Send tool results back in a new request
6. Repeat until model provides final answer

**See**: `references/ADVANCED_PATTERNS.md` for complete agentic loop implementation

### When to Use Structured Outputs

**Good use cases**:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format

**Implementation pattern**:
1. Define JSON Schema for desired output
2. Set `response_format: { type: 'json_schema', json_schema: { ... } }`
3. Instruct model to produce JSON (system or user message)
4. Validate response against schema
5. Handle parsing errors gracefully

**Add response healing** for robustness:
```typescript
{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}
```

### When to Use Web Search

**Good use cases**:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information

**Simple implementation** (variant):
```typescript
{
  model: 'anthropic/claude-3.5-sonnet:online'
}
```

**Advanced implementation** (plugin):
```typescript
{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}
```

### When to Use Multimodal Inputs

**Images** (vision):
- OCR, image understanding, visual analysis
- Models: `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`, `google/gemini-2.5-pro`

**Audio**:
- Speech-to-text, audio analysis
- Models with audio support

**Video**:
- Video understanding, frame analysis
- Models with video support

**PDFs**:
- Document parsing, content extraction
- Requires `file-parser` plugin

**Implementation**: See `references/ADVANCED_PATTERNS.md` for multimodal patterns

---

## Best Practices for AI

### Default Model Selection

**Start with**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility

**Switch based on needs**:
- Need speed → `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Complex reasoning → `anthropic/claude-opus-4:thinking`
- Need web search → `:online` variant
- Large context → `:extended` variant
- Cost-sensitive → `:free` variant

### Default Parameters

```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}
```

**Adjust based on task**:
- Code: `temperature: 0.2`
- Creative: `temperature: 1.0`
- Factual: `temperature: 0.0-0.3`

### When to Prefer Streaming

**Always prefer streaming when**:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters

**Use non-streaming when**:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)

### When to Enable Specific Features

**Tools**: Enable when you need external data or actions
**Structured outputs**: Enable when response format matters
**Web search**: Enable when current information needed
**Streaming**: Enable for user-facing, real-time responses
**Model fallbacks**: Enable when reliability critical
**Provider routing**: Enable when you have preferences or constraints

### Cost Optimization Patterns

**Use free models for**:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations

**Use routing to optimize**:
```typescript
{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}
```

**Set max_tokens** to prevent runaway responses
**Use caching** via `user` and `session_id` parameters
**Enable prompt caching** when supported

### Performance Optimization

**Reduce latency**:
- Use `:nitro` variants for speed
- Use streaming for perceived speed
- Set `user` ID for caching benefits
- Choose faster models (mini, flash) when quality allows

**Increase throughput**:
- Use provider routing with `sort: 'throughput'`
- Parallelize independent requests
- Use streaming to reduce wait time

**Optimize for specific metrics**:
```typescript
{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}
```

---

## Progressive Disclosure

For detailed reference information, consult:

### Parameters Reference
**File**: `references/PARAMETERS.md`
- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples

### Error Codes Reference
**File**: `references/ERROR_CODES.md`
- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies

### Model Selection Guide
**File**: `references/MODEL_SELECTION.md`
- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences

### Routing Strategies
**File**: `references/ROUTING_STRATEGIES.md`
- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)

### Advanced Patterns
**File**: `references/ADVANCED_PATTERNS.md`
- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations

### Working Examples
**File**: `references/EXAMPLES.md`
- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples

### Ready-to-Use Templates
**Directory**: `templates/`
- `basic-request.ts` - Minimal working request
- `streaming-request.ts` - SSE streaming with cancellation
- `tool-calling.ts` - Complete agentic loop with tools
- `structured-output.ts` - JSON Schema enforcement
- `error-handling.ts` - Robust retry logic

---

## Quick Reference

### Minimal Request
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}
```

### With Streaming
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}
```

### With Tools
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}
```

### With Structured Output
```typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}
```

### With Web Search
```typescript
{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}
```

### With Model Fallbacks
```typescript
{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}
```

---

**Remember**: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with `baseURL: 'https://openrouter.ai/api/v1'` for a familiar experience.

Overview

This skill is an expert OpenRouter API assistant for AI agents, providing practical guidance for integrating with OpenRouter's unified access to 400+ models. It focuses on chat completions, streaming, tool calling, structured outputs, multimodal inputs, model selection, routing, and robust error handling. Use it to build reliable, cost-effective agent integrations that require provider flexibility and advanced features.

How this skill works

The skill describes the core request and response patterns for OpenRouter's chat/completions endpoint, including required headers and minimal request structure. It explains streaming (SSE) handling, non-streaming parsing, tool-calling mechanics, structured output modes (JSON and JSON Schema), model selection and variant usage, provider routing, and error/retry strategies. Practical code snippets and parameter recommendations guide implementation choices and parameter tuning.

When to use it

Making chat completions or agent-driven conversations across many providers
Implementing streaming real-time responses for chat UIs or terminals
Enabling tool/function calling and executing model-driven workflows
Enforcing structured outputs for APIs, databases, or automated pipelines
Routing across providers and implementing model fallbacks for reliability
Handling multimodal inputs (images, audio, video, PDFs) and embeddings

Best practices

Always specify model and set max_tokens to control cost and avoid runaway output
Use streaming for low-latency user-facing UIs and non-streaming for batch or synchronous processing
Set temperature/top_p/top_k according to task (low for code/factual, higher for creative tasks)
Provide explicit provider order and fallbacks when reliability or cost guarantees matter
Wrap requests with exponential backoff and graceful handling for 429/5xx errors
Use response_format/json_schema for strict, machine-parseable outputs when integrating downstream systems

Example use cases

A chat application that streams assistant tokens to the client with progressive UI updates
An agent that calls external tools (search, DB, APIs) via model-invoked tool calls and executes returned arguments
Batch document analysis using large-context variants to process and summarize multi-file uploads
Cost-optimized routing that prefers free or low-cost variants with automatic fallbacks to paid providers on failure
A data extraction pipeline that requires strict JSON Schema output for downstream ingestion

FAQ

When should I use streaming versus non-streaming?

Use streaming for interactive, low-latency experiences and long outputs; use non-streaming for background jobs, short responses, or when you need the complete output before processing.

How do I force a specific tool or prevent tool calls?

Set tool_choice to a specific function object to force a tool, 'none' to disable, 'required' to mandate a tool, and keep 'auto' to let the model decide.