home / skills / openclaw / skills / openai-whisper-api

openai-whisper-api skill

safe

This skill transcribes audio using OpenAI Whisper API, generating text transcripts from audio files with configurable model, language, and output options.

npx playbooks add skill openclaw/skills --skill openai-whisper-api

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

1.1 KB

---
name: openai-whisper-api
description: Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
homepage: https://platform.openai.com/docs/guides/speech-to-text
metadata: {"clawdbot":{"emoji":"☁️","requires":{"bins":["curl"],"env":["OPENAI_API_KEY"]},"primaryEnv":"OPENAI_API_KEY"}}
---

# OpenAI Whisper API (curl)

Transcribe an audio file via OpenAI’s `/v1/audio/transcriptions` endpoint.

## Quick start

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
```

Defaults:
- Model: `whisper-1`
- Output: `<input>.txt`

## Useful flags

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
```

## API key

Set `OPENAI_API_KEY`, or configure it in `~/.clawdbot/clawdbot.json`:

```json5
{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE"
    }
  }
}
```

Overview

This skill transcribes audio using the OpenAI Audio Transcriptions API (Whisper) so you can convert speech to text quickly and reliably. It uses the whisper-1 model by default and writes output to a text file next to the input unless you specify a different path. The tool supports model selection, language hints, custom prompts, and JSON output for integration.

How this skill works

The script calls the /v1/audio/transcriptions endpoint with your audio file and API key. You can pass flags to select model, force language, provide a prompt (for speaker names or context), and choose plain text or JSON output. API credentials are read from the OPENAI_API_KEY environment variable or from a local configuration file.

When to use it

Transcribing interviews, meetings, or lectures to searchable text.
Generating captions or notes from recorded audio for content production.
Batch-processing archived audio as part of an indexing workflow.
When you need language hints or speaker-context prompts to improve accuracy.
Exporting structured transcripts in JSON for downstream parsing.

Best practices

Use clear, high-quality audio and reduce background noise for better accuracy.
Provide a language flag for non-auto-detected languages to reduce errors.
Use the prompt flag to supply speaker names or domain context when relevant.
Request JSON output if you plan to programmatically segment or label transcripts.
Keep your OPENAI_API_KEY secure and load it from environment variables or a protected config file.

Example use cases

Run scripts/transcribe.sh meeting.m4a to generate meeting notes in meeting.m4a.txt.
Transcribe a podcast episode to JSON, then import timestamps and speakers into an editor.
Process a folder of lecture recordings to create searchable text for an LMS.
Provide speaker names in a prompt for panel discussions to improve labeling.

FAQ

Where does the API key come from?

Set OPENAI_API_KEY in your environment or place it under the skill entry in a local config file.

How do I change the output file?

Use the --out flag to specify a different output path and filename.

Can I force the transcription language?

Yes — pass the --language flag with the ISO language code to avoid incorrect auto-detection.