home / skills / openai / skills / transcribe
This skill transcribes audio files to text with optional diarization and known-speaker hints, providing structured outputs for meetings, interviews, and
npx playbooks add skill openai/skills --skill transcribeReview the files below or copy the command above to add this skill to your agents.
---
name: "transcribe"
description: "Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings."
---
# Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
## Workflow
1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
2. Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key).
3. Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription).
4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
5. Save outputs under `output/transcribe/` when working in this repo.
## Decision rules
- Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription.
- If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`.
- If audio is longer than ~30 seconds, keep `--chunking-strategy auto`.
- Prompting is not supported for `gpt-4o-transcribe-diarize`.
## Output conventions
- Use `output/transcribe/<job-id>/` for evaluation runs.
- Use `--out-dir` for multiple files to avoid overwriting.
## Dependencies (install if missing)
Prefer `uv` for dependency management.
```
uv pip install openai
```
If `uv` is unavailable:
```
python3 -m pip install openai
```
## Environment
- `OPENAI_API_KEY` must be set for live API calls.
- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
- Never ask the user to paste the full key in chat.
## Skill path (set once)
```bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
```
User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).
## CLI quick start
Single file (fast text default):
```
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt
```
Diarization with known speakers (up to 4):
```
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting
```
Plain text output (explicit):
```
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt
```
## Reference map
- `references/api.md`: supported formats, limits, response formats, and known-speaker notes.
This skill transcribes audio and video files to text, with optional speaker diarization and support for known-speaker hints. It provides a CLI for repeatable runs, sensible defaults for fast transcription, and a diarization mode for labeling speakers in interviews or meetings. Outputs can be plain text, JSON, or diarized JSON and are written to an organized output directory.
You point the CLI at one or more audio files and choose a response format and optional language or known-speaker references. The tool defaults to gpt-4o-mini-transcribe for fast text output and switches to gpt-4o-transcribe-diarize when diarization or speaker labels are requested. It verifies that OPENAI_API_KEY is set, runs the transcription with chunking for long audio, validates segment boundaries and labels, then writes results to output/transcribe/<job-id> or a specified out-dir. For diarization with known speakers, you can provide up to four speaker reference files.
What if I don’t have an API key set?
Export an OpenAI API key in your shell (create one in the OpenAI platform UI if needed). The CLI will refuse to run if OPENAI_API_KEY is missing; do not paste the full key into chat.
Which model should I pick for diarization?
Use gpt-4o-transcribe-diarize with --response-format diarized_json for speaker labeling. Prompting is not supported with that diarization model.