home / skills / jezweb / claude-skills / openai-responses

openai-responses skill

safe

/skills/openai-responses

This skill helps you design agentic AI workflows using the OpenAI Responses API to maintain persistent reasoning across turns and leverage server-side tools.

npx playbooks add skill jezweb/claude-skills --skill openai-responses

Review the files below or copy the command above to add this skill to your agents.

Files (22)

SKILL.md

17.5 KB

---
name: openai-responses
description: |
  Build agentic AI with OpenAI Responses API - stateful conversations with preserved reasoning, built-in tools (Code Interpreter, File Search, Web Search), and MCP integration. Prevents 11 documented errors.

  Use when: building agents with persistent reasoning, using server-side tools, or migrating from Chat Completions/Assistants for better multi-turn performance.
user-invocable: true
---

# OpenAI Responses API

**Status**: Production Ready
**Last Updated**: 2026-01-21
**API Launch**: March 2025
**Dependencies**: [email protected] (Node.js) or fetch API (Cloudflare Workers)

---

## What Is the Responses API?

OpenAI's unified interface for agentic applications, launched **March 2025**. Provides **stateful conversations** with **preserved reasoning state** across turns.

**Key Innovation:** Unlike Chat Completions (reasoning discarded between turns), Responses **preserves the model's reasoning notebook**, improving performance by **5% on TAUBench** and enabling better multi-turn interactions.

**vs Chat Completions:**

| Feature | Chat Completions | Responses API |
|---------|-----------------|---------------|
| State | Manual history tracking | Automatic (conversation IDs) |
| Reasoning | Dropped between turns | Preserved across turns (+5% TAUBench) |
| Tools | Client-side round trips | Server-side hosted |
| Output | Single message | Polymorphic (8 types) |
| Cache | Baseline | **40-80% better utilization** |
| MCP | Manual | Built-in |

---

## Quick Start

```bash
npm install [email protected]
```

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'What are the 5 Ds of dodgeball?',
});

console.log(response.output_text);
```

**Key differences from Chat Completions:**
- Endpoint: `/v1/responses` (not `/v1/chat/completions`)
- Parameter: `input` (not `messages`)
- Role: `developer` (not `system`)
- Output: `response.output_text` (not `choices[0].message.content`)

---

## When to Use Responses vs Chat Completions

**Use Responses:**
- Agentic applications (reasoning + actions)
- Multi-turn conversations (preserved reasoning = +5% TAUBench)
- Built-in tools (Code Interpreter, File Search, Web Search, MCP)
- Background processing (60s standard, 10min extended timeout)

**Use Chat Completions:**
- Simple one-off generation
- Fully stateless interactions
- Legacy integrations

---

## Stateful Conversations

**Automatic State Management** using conversation IDs:

```typescript
// Create conversation
const conv = await openai.conversations.create({
  metadata: { user_id: 'user_123' },
});

// First turn
const response1 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id,
  input: 'What are the 5 Ds of dodgeball?',
});

// Second turn - model remembers context + reasoning
const response2 = await openai.responses.create({
  model: 'gpt-5',
  conversation: conv.id,
  input: 'Tell me more about the first one',
});
```

**Benefits:** No manual history tracking, reasoning preserved, 40-80% better cache utilization

**Conversation Limits:** 90-day expiration

---

## Built-in Tools (Server-Side)

**Server-side hosted tools** eliminate backend round trips:

| Tool | Purpose | Notes |
|------|---------|-------|
| `code_interpreter` | Execute Python code | Sandboxed, 30s timeout (use `background: true` for longer) |
| `file_search` | RAG without vector stores | Max 512MB per file, supports PDF/Word/Markdown/HTML/code |
| `web_search` | Real-time web information | Automatic source citations |
| `image_generation` | DALL-E integration | DALL-E 3 default |
| `mcp` | Connect external tools | OAuth supported, tokens NOT stored |

**Usage:**
```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Calculate mean of: 10, 20, 30, 40, 50',
  tools: [{ type: 'code_interpreter' }],
});
```

### Web Search TypeScript Note

**TypeScript Limitation**: The `web_search` tool's `external_web_access` option is missing from SDK types (as of v6.16.0).

**Workaround**:
```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Search for recent news',
  tools: [{
    type: 'web_search',
    external_web_access: true,
  } as any],  // ✅ Type assertion to suppress error
});
```

**Source**: [GitHub Issue #1716](https://github.com/openai/openai-node/issues/1716)

---

## MCP Server Integration

Built-in support for **Model Context Protocol (MCP)** servers to connect external tools (Stripe, databases, custom APIs).

### User Approval Requirement

**By default, explicit user approval is required** before any data is shared with a remote MCP server (security feature).

**Handling Approval**:
```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Get my Stripe balance',
  tools: [{
    type: 'mcp',
    server_label: 'stripe',
    server_url: 'https://mcp.stripe.com',
    authorization: process.env.STRIPE_TOKEN,
  }],
});

if (response.status === 'requires_approval') {
  // Show user: "This action requires sharing data with Stripe. Approve?"
  // After user approves, retry with approval token
}
```

**Alternative**: Pre-approve MCP servers in OpenAI dashboard (users configure trusted servers via settings)

**Source**: [Official MCP Guide](https://platform.openai.com/docs/guides/tools-connectors-mcp)

### Basic MCP Usage

```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Roll 2d6 dice',
  tools: [{
    type: 'mcp',
    server_label: 'dice',
    server_url: 'https://example.com/mcp',
    authorization: process.env.TOKEN, // ⚠️ NOT stored, required each request
  }],
});
```

**MCP Output Types:**
- `mcp_list_tools` - Tools discovered on server
- `mcp_call` - Tool invocation + result
- `message` - Final response

---

## Reasoning Preservation

**Key Innovation:** Model's internal reasoning state survives across turns (unlike Chat Completions which discards it).

**Visual Analogy:**
- Chat Completions: Model tears out scratchpad page before responding
- Responses API: Scratchpad stays open for next turn

**Performance:** +5% on TAUBench (GPT-5) purely from preserved reasoning

**Reasoning Summaries** (free):
```typescript
response.output.forEach(item => {
  if (item.type === 'reasoning') console.log(item.summary[0].text);
  if (item.type === 'message') console.log(item.content[0].text);
});
```

### Important: Reasoning Traces Privacy

**What You Get**: Reasoning summaries (not full internal traces)
**What OpenAI Keeps**: Full chain-of-thought reasoning (proprietary, for security/privacy)

For GPT-5-Thinking models:
- OpenAI preserves reasoning **internally** in their backend
- This preserved reasoning improves multi-turn performance (+5% TAUBench)
- But developers only receive **summaries**, not the actual chain-of-thought
- Full reasoning traces are not exposed (OpenAI's IP protection)

**Source**: [Sean Goedecke Analysis](https://www.seangoedecke.com/responses-api/)

---

## Background Mode

For long-running tasks, use `background: true`:

```typescript
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Analyze 500-page document',
  background: true,
  tools: [{ type: 'file_search', file_ids: [fileId] }],
});

// Poll for completion (check every 5s)
const result = await openai.responses.retrieve(response.id);
if (result.status === 'completed') console.log(result.output_text);
```

**Timeout Limits:**
- Standard: 60 seconds
- Background: 10 minutes

### Performance Considerations

**Time-to-First-Token (TTFT) Latency:**
Background mode currently has higher TTFT compared to synchronous responses. OpenAI is working to reduce this gap.

**Recommendation:**
- For user-facing real-time responses, use sync mode (lower latency)
- For long-running async tasks, use background mode (latency acceptable)

**Source**: [OpenAI Background Mode Docs](https://platform.openai.com/docs/guides/background)

---

## Data Retention and Privacy

**Default Retention**: 30 days when `store: true` (default)
**Zero Data Retention (ZDR)**: Organizations with ZDR automatically enforce `store: false`
**Background Mode**: NOT ZDR compatible (stores data ~10 minutes for polling)

**Timeline**:
- September 26, 2025: OpenAI court-ordered retention ended
- Current: 30-day default retention with `store: true`

**Control Storage**:
```typescript
// Disable storage (no retention)
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Hello!',
  store: false,  // ✅ No retention
});

// ZDR organizations: store always treated as false
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Hello!',
  store: true,  // ⚠️ Ignored by OpenAI for ZDR orgs, treated as false
});
```

**ZDR Compliance**:
- Avoid background mode (requires temporary storage)
- Explicitly set `store: false` for clarity
- Note: 60s timeout applies in sync mode

**Source**: [OpenAI Data Controls](https://platform.openai.com/docs/guides/your-data)

---

## Polymorphic Outputs

Returns **8 output types** instead of single message:

| Type | Example |
|------|---------|
| `message` | Final answer, explanation |
| `reasoning` | Step-by-step thought process (free!) |
| `code_interpreter_call` | Python code + results |
| `mcp_call` | Tool name, args, output |
| `mcp_list_tools` | Tool definitions from MCP server |
| `file_search_call` | Matched chunks, citations |
| `web_search_call` | URLs, snippets |
| `image_generation_call` | Image URL |

**Processing:**
```typescript
response.output.forEach(item => {
  if (item.type === 'reasoning') console.log(item.summary[0].text);
  if (item.type === 'web_search_call') console.log(item.results);
  if (item.type === 'message') console.log(item.content[0].text);
});

// Or use helper for text-only
console.log(response.output_text);
```

---

## Migration from Chat Completions

**Breaking Changes:**

| Feature | Chat Completions | Responses API |
|---------|-----------------|---------------|
| Endpoint | `/v1/chat/completions` | `/v1/responses` |
| Parameter | `messages` | `input` |
| Role | `system` | `developer` |
| Output | `choices[0].message.content` | `output_text` |
| State | Manual array | Automatic (conversation ID) |
| Streaming | `data: {"choices":[...]}` | SSE with 8 item types |

**Example:**
```typescript
// Before
const response = await openai.chat.completions.create({
  model: 'gpt-5',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});
console.log(response.choices[0].message.content);

// After
const response = await openai.responses.create({
  model: 'gpt-5',
  input: [
    { role: 'developer', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
});
console.log(response.output_text);
```

---

## Migration from Assistants API

**CRITICAL: Assistants API Sunset Timeline**

- **August 26, 2025**: Assistants API officially deprecated
- **2025-2026**: OpenAI providing migration utilities
- **August 26, 2026**: Assistants API sunset (stops working)

**Migrate before August 26, 2026** to avoid breaking changes.

**Source**: [Assistants API Sunset Announcement](https://community.openai.com/t/assistants-api-beta-deprecation-august-26-2026-sunset/1354666)

**Key Breaking Changes:**

| Assistants API | Responses API |
|----------------|---------------|
| Assistants (created via API) | Prompts (created in dashboard) |
| Threads | Conversations (store items, not just messages) |
| Runs (server-side lifecycle) | Responses (stateless calls) |
| Run-Steps | Items (polymorphic outputs) |

**Migration Example:**
```typescript
// Before (Assistants API - deprecated)
const assistant = await openai.beta.assistants.create({
  model: 'gpt-4',
  instructions: 'You are helpful.',
});

const thread = await openai.beta.threads.create();

const run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});

// After (Responses API - current)
const conversation = await openai.conversations.create({
  metadata: { purpose: 'customer_support' },
});

const response = await openai.responses.create({
  model: 'gpt-5',
  conversation: conversation.id,
  input: [
    { role: 'developer', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' },
  ],
});
```

**Migration Guide**: [Official Assistants Migration Docs](https://platform.openai.com/docs/guides/migrate-to-responses)

---

## Known Issues Prevention

This skill prevents **11** documented errors:

**1. Session State Not Persisting**
- Cause: Not using conversation IDs or using different IDs per turn
- Fix: Create conversation once (`const conv = await openai.conversations.create()`), reuse `conv.id` for all turns

**2. MCP Server Connection Failed** (`mcp_connection_error`)
- Causes: Invalid URL, missing/expired auth token, server down
- Fix: Verify URL is correct, test manually with `fetch()`, check token expiration

**3. Code Interpreter Timeout** (`code_interpreter_timeout`)
- Cause: Code runs longer than 30 seconds
- Fix: Use `background: true` for extended timeout (up to 10 min)

**4. Image Generation Rate Limit** (`rate_limit_error`)
- Cause: Too many DALL-E requests
- Fix: Implement exponential backoff retry (1s, 2s, 3s delays)

**5. File Search Relevance Issues**
- Cause: Vague queries return irrelevant results
- Fix: Use specific queries ("pricing in Q4 2024" not "find pricing"), filter by `chunk.score > 0.7`

**6. Cost Tracking Confusion**
- Cause: Responses bills for input + output + tools + stored conversations (vs Chat Completions: input + output only)
- Fix: Set `store: false` if not needed, monitor `response.usage.tool_tokens`

**7. Conversation Not Found** (`invalid_request_error`)
- Causes: ID typo, conversation deleted, or expired (90-day limit)
- Fix: Verify exists with `openai.conversations.list()` before using

**8. Tool Output Parsing Failed**
- Cause: Accessing wrong output structure
- Fix: Use `response.output_text` helper or iterate `response.output.forEach(item => ...)` checking `item.type`

**9. Zod v4 Incompatibility with Structured Outputs**
- **Error**: `Invalid schema for response_format 'name': schema must be a JSON Schema of 'type: "object"', got 'type: "string"'.`
- **Source**: [GitHub Issue #1597](https://github.com/openai/openai-node/issues/1597)
- **Why It Happens**: SDK's vendored `zod-to-json-schema` library doesn't support Zod v4 (missing `ZodFirstPartyTypeKind` export)
- **Prevention**: Pin to Zod v3 (`"zod": "^3.23.8"`) or use custom `zodTextFormat` with `z.toJSONSchema({ target: "draft-7" })`

```typescript
// Workaround: Pin to Zod v3 (recommended)
{
  "dependencies": {
    "openai": "^6.16.0",
    "zod": "^3.23.8"  // DO NOT upgrade to v4 yet
  }
}
```

**10. Background Mode Web Search Missing Sources**
- **Error**: `web_search_call` output items contain query but no sources/results
- **Source**: [GitHub Issue #1676](https://github.com/openai/openai-node/issues/1676)
- **Why It Happens**: When using `background: true` + `web_search` tool, OpenAI doesn't return sources in the response
- **Prevention**: Use synchronous mode (`background: false`) when web search sources are needed

```typescript
// ✅ Sources available in sync mode
const response = await openai.responses.create({
  model: 'gpt-5',
  input: 'Latest AI news?',
  background: false,  // Required for sources
  tools: [{ type: 'web_search' }],
});
```

**11. Streaming Mode Missing output_text Helper**
- **Error**: `finalResponse().output_text` is `undefined` in streaming mode
- **Source**: [GitHub Issue #1662](https://github.com/openai/openai-node/issues/1662)
- **Why It Happens**: `stream.finalResponse()` doesn't include `output_text` convenience field (only available in non-streaming responses)
- **Prevention**: Listen for `output_text.done` event or manually extract from `output` items

```typescript
// Workaround: Listen for event
const stream = openai.responses.stream({ model: 'gpt-5', input: 'Hello!' });
let outputText = '';
for await (const event of stream) {
  if (event.type === 'output_text.done') {
    outputText = event.output_text;  // ✅ Available in event
  }
}
```

---

## Critical Patterns

**✅ Always:**
- Use conversation IDs for multi-turn (40-80% better cache)
- Handle all 8 output types in polymorphic responses
- Use `background: true` for tasks >30s
- Provide MCP `authorization` tokens (NOT stored, required each request)
- Monitor `response.usage.total_tokens` for cost control

**❌ Never:**
- Expose API keys in client-side code
- Assume single message output (use `response.output_text` helper)
- Reuse conversation IDs across users (security risk)
- Ignore error types (handle `rate_limit_error`, `mcp_connection_error` specifically)
- Poll faster than 1s for background tasks (use 5s intervals)

---

## References

**Official Docs:**
- Responses API Guide: https://platform.openai.com/docs/guides/responses
- API Reference: https://platform.openai.com/docs/api-reference/responses
- MCP Integration: https://platform.openai.com/docs/guides/tools-connectors-mcp
- Blog Post: https://developers.openai.com/blog/responses-api/
- Starter App: https://github.com/openai/openai-responses-starter-app

**Skill Resources:** `templates/`, `references/responses-vs-chat-completions.md`, `references/mcp-integration-guide.md`, `references/built-in-tools-guide.md`, `references/migration-guide.md`, `references/top-errors.md`

---

**Last verified**: 2026-01-21 | **Skill version**: 2.1.0 | **Changes**: Added 3 TIER 1 issues (Zod v4, background web search, streaming output_text), 2 TIER 2 findings (MCP approval, reasoning privacy), Data Retention & ZDR section, Assistants API sunset timeline, background mode TTFT note, web search TypeScript limitation. Updated SDK version to 6.16.0.

Overview

This skill packages guidance and best practices for building agentic applications with the OpenAI Responses API. It focuses on stateful conversations with preserved reasoning, server-side tools (Code Interpreter, File Search, Web Search), MCP integration, and specific fixes for 11 documented errors. The content helps teams migrate from Chat Completions/Assistants and avoid common pitfalls.

How this skill works

The skill explains how Responses preserves the model's internal reasoning notebook across turns using conversation IDs, provides polymorphic outputs (message, reasoning, tool calls), and hosts server-side tools to eliminate backend round trips. It shows how to create and reuse conversations, invoke built-in tools, run background jobs, and handle MCP approval flows and storage controls.

When to use it

Building agentic, multi-turn assistants that need preserved reasoning state.
Running server-side tools (code execution, file search, web search) without extra backend round trips.
Migrating from Chat Completions or Assistants APIs for better multi-turn performance and caching.
Long-running analyses or large-document processing using background mode.
Connecting external services via MCP with explicit approval requirements.

Best practices

Create a single conversation ID per session and reuse it to preserve reasoning and context.
Use response.output_text or iterate response.output items to handle polymorphic outputs safely.
Set store: false when you must avoid retention; avoid background mode for strict ZDR requirements.
For long code runs, set background: true to extend timeout; for user-facing latency prefer sync mode.
Pre-approve MCP servers in your UX or handle requires_approval status and prompt users explicitly.

Example use cases

Customer support agent that remembers multi-turn diagnostic reasoning across a session.
Server-side data analysis pipeline using code_interpreter to run Python on uploaded files.
RAG-style document search using file_search without managing a vector store.
Real-time assistant that uses web_search for live citations and MCP to call payments or CRM APIs.
Migration of an existing Chat Completions assistant to Responses to gain preserved reasoning and better caching.

FAQ

How do I preserve conversation state between requests?

Create a conversation once via openai.conversations.create() and reuse the returned conversation.id for all turns.

What if an MCP tool requires user approval?

Handle response.status === 'requires_approval' by prompting the user to approve data sharing, then retry the request with the approval token or pre-configure trusted servers in the dashboard.

When should I use background mode?

Use background: true for long-running tasks (up to 10 minutes) such as analyzing large documents or long code executions; prefer sync mode for lower TTFT in user-facing flows.