home / skills / badlogic / pi-skills / transcribe
This skill transcribes audio to text using Groq Whisper API, handling multiple formats and delivering clean, punctuated output.
npx playbooks add skill badlogic/pi-skills --skill transcribeReview the files below or copy the command above to add this skill to your agents.
---
name: transcribe
description: Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.
---
# Transcribe
Speech-to-text using Groq Whisper API.
## Setup
The script needs `GROQ_API_KEY` environment variable. Check if already set:
```bash
echo $GROQ_API_KEY
```
If not set, guide the user through setup:
1. Ask if they have a Groq API key
2. If not, have them sign up at https://console.groq.com/ and create an API key
3. Have them add to their shell profile (~/.zshrc or ~/.bashrc):
```bash
export GROQ_API_KEY="<their-api-key>"
```
4. Then run `source ~/.zshrc` (or restart terminal)
## Usage
```bash
{baseDir}/transcribe.sh <audio-file>
```
## Supported Formats
- m4a, mp3, wav, ogg, flac, webm
- Max file size: 25MB
## Output
Returns plain text transcription with punctuation and proper capitalization to stdout.
This skill provides speech-to-text transcription using the Groq Whisper API. It accepts common audio formats and returns a punctuation- and capitalization-aware plain-text transcription to stdout. It’s designed for quick command-line use in developer workflows.
The script uploads an audio file to the Groq Whisper API and receives a text transcription response. It supports m4a, mp3, wav, ogg, flac, and webm files up to 25 MB. The tool requires a GROQ_API_KEY environment variable for authenticated requests and prints the final transcription to standard output.
What audio formats are supported?
m4a, mp3, wav, ogg, flac, and webm are supported.
How large can the audio file be?
Maximum file size is 25 MB. Split longer files before uploading.
How do I supply the API key?
Set GROQ_API_KEY as an environment variable in your shell profile and reload the terminal.