home / skills / willsigmon / sigstack / deepgram-expert

deepgram-expert skill

safe

/plugins/voice-input/skills/deepgram-expert

This skill helps you integrate real-time Deepgram transcription with low latency, speaker diarization, and multilingual support into applications.

npx playbooks add skill willsigmon/sigstack --skill deepgram-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.5 KB

---
name: Deepgram Expert
description: Deepgram - real-time transcription, low latency, speaker diarization, enterprise-ready
allowed-tools: Read, Edit, Bash, WebFetch
model: sonnet
---

# Deepgram Expert

Enterprise-grade real-time speech-to-text.

## Pricing (2026)

- **$0.0043/minute** (pay-as-you-go)
- Volume tiers automatic (no contracts)
- Real-time and batch options
- Speaker diarization included

## Key Advantages
- **20 seconds** to transcribe 1 hour
- Low latency streaming (0.2-0.3x real-time)
- 5.8% WER on technical audio
- Speaker diarization built-in
- 30+ languages

## API Usage

### Batch Transcription
```python
from deepgram import DeepgramClient, PrerecordedOptions

deepgram = DeepgramClient(api_key="...")

options = PrerecordedOptions(
    model="nova-3",
    smart_format=True,
    diarize=True,
    punctuate=True
)

response = deepgram.listen.rest.v1.transcribe_file(
    {"buffer": open("audio.mp3", "rb")},
    options
)

print(response.results.channels[0].alternatives[0].transcript)
```

### Real-Time Streaming
```python
from deepgram import DeepgramClient, LiveOptions

deepgram = DeepgramClient(api_key="...")

options = LiveOptions(
    model="nova-3",
    language="en",
    smart_format=True,
    interim_results=True
)

connection = deepgram.listen.websocket.v1.start(options)

@connection.on("transcript_received")
def on_transcript(transcript):
    if transcript.is_final:
        print(transcript.channel.alternatives[0].transcript)

# Stream audio
connection.send(audio_chunk)
```

### Speaker Diarization
```python
options = PrerecordedOptions(
    model="nova-3",
    diarize=True
)

# Response includes speaker labels
for word in response.results.channels[0].alternatives[0].words:
    print(f"Speaker {word.speaker}: {word.word}")
```

## Model Options

| Model | Speed | Accuracy | Use Case |
|-------|-------|----------|----------|
| nova-3 | Fast | Best | General |
| whisper | Medium | Good | Accuracy focus |
| enhanced | Fast | Good | Low latency |

## Integration

### Node.js SDK
```typescript
import { createClient } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_KEY);

const { result } = await deepgram.listen.prerecorded.transcribeFile(
  fs.createReadStream("audio.mp3"),
  { model: "nova-3", smart_format: true }
);
```

## Best For
- Real-time applications
- Call center transcription
- Live captioning
- Voice analytics
- Technical/medical audio

Use when: Real-time transcription, low-latency requirements, speaker identification

Overview

This skill packages Deepgram enterprise-grade speech-to-text capabilities for real-time and batch transcription with low latency and built-in speaker diarization. It targets high-throughput production use: live captioning, call centers, voice analytics, and technical audio. The implementation supports Node.js and Python SDKs and exposes model selection and streaming controls for latency and accuracy trade-offs.

How this skill works

The skill uses Deepgram’s SDKs and WebSocket streaming to send audio and receive transcripts with interim and final results. Batch transcription runs via REST/file uploads, returning formatted transcripts and word-level speaker labels when diarization is enabled. Model options (nova-3, whisper, enhanced) let you prioritize speed or accuracy; pricing is metered per minute with automatic volume tiers.

When to use it

Live captioning for events or broadcasts requiring sub-second latency.
Call center transcription with speaker diarization for agent/customer separation.
Voice analytics pipelines that need fast transcript availability for indexing.
Transcribing technical or medical audio where low WER matters.
Batch processing large audio archives where speed and cost efficiency matter.

Best practices

Choose nova-3 for general low-latency use, whisper when accuracy is prioritized.
Enable smart_format and punctuate to get readable, production-ready transcripts.
Use interim_results for responsive UIs, but rely on is_final for persistent storage.
Enable diarize for multi-speaker audio and map speaker labels to meeting metadata.
Chunk long streams into manageable segments to reduce memory and simplify retry logic.

Example use cases

Real-time captions for webinars and live streams with sub-second display.
Automated call summaries and QA in contact centers with speaker separation.
Indexer for searchable meeting archives with timestamped, speaker-labeled words.
Medical research transcription where technical accuracy and fast turnaround are required.

FAQ

What latency can I expect for streaming transcription?

Typical low-latency streaming runs around 0.2–0.3x real-time; final transcript availability depends on model and network conditions.

Does diarization increase cost or processing time?

Diarization is included and adds modest processing overhead; it can increase turnaround time slightly for batch jobs but is available in both real-time and prerecorded flows.