home / skills / daymade / claude-code-skills / transcript-fixer

transcript-fixer skill

unsafe

This skill corrects speech-to-text errors in transcripts using dictionary rules, AI corrections, and learned patterns to build a personalized correction

npx playbooks add skill daymade/claude-code-skills --skill transcript-fixer

Review the files below or copy the command above to add this skill to your agents.

Files (69)

SKILL.md

6.7 KB

---
name: transcript-fixer
description: Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.
---

# Transcript Fixer

Correct speech-to-text transcription errors through dictionary-based rules, AI-powered corrections, and automatic pattern detection. Build a personalized knowledge base that learns from each correction.

## When to Use This Skill

- Correcting ASR/STT errors in meeting notes, lectures, or interviews
- Building domain-specific correction dictionaries
- Fixing Chinese/English homophone errors or technical terminology
- Collaborating on shared correction knowledge bases

## Prerequisites

**Python execution must use `uv`** - never use system Python directly.

If `uv` is not installed:
```bash
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```

## Quick Start

**Recommended: Use Enhanced Wrapper** (auto-detects API key, opens HTML diff):

```bash
# First time: Initialize database
uv run scripts/fix_transcription.py --init

# Process transcript with enhanced UX
uv run scripts/fix_transcript_enhanced.py input.md --output ./corrected
```

The enhanced wrapper automatically:
- Detects GLM API key from shell configs (checks lines near `ANTHROPIC_BASE_URL`)
- Moves output files to specified directory
- Opens HTML visual diff in browser for immediate feedback

**Alternative: Use Core Script Directly**:

```bash
# 1. Set API key (if not auto-detected)
export GLM_API_KEY="<api-key>"  # From https://open.bigmodel.cn/

# 2. Add common corrections (5-10 terms)
uv run scripts/fix_transcription.py --add "错误词" "正确词" --domain general

# 3. Run full correction pipeline
uv run scripts/fix_transcription.py --input meeting.md --stage 3

# 4. Review learned patterns after 3-5 runs
uv run scripts/fix_transcription.py --review-learned
```

**Output files**:
- `*_stage1.md` - Dictionary corrections applied
- `*_stage2.md` - AI corrections applied (final version)
- `*_对比.html` - Visual diff (open in browser for best experience)

**Generate word-level diff** (recommended for reviewing corrections):
```bash
uv run scripts/generate_word_diff.py original.md corrected.md output.html
```

This creates an HTML file showing word-by-word differences with clear highlighting:
- 🔴 `japanese 3 pro` → 🟢 `Gemini 3 Pro` (complete word replacements)
- Easy to spot exactly what changed without character-level noise

## Example Session

**Input transcript** (`meeting.md`):
```
今天我们讨论了巨升智能的最新进展。
股价系统需要优化，目前性能不够好。
```

**After Stage 1** (`meeting_stage1.md`):
```
今天我们讨论了具身智能的最新进展。  ← "巨升"→"具身" corrected
股价系统需要优化,目前性能不够好。  ← Unchanged (not in dictionary)
```

**After Stage 2** (`meeting_stage2.md`):
```
今天我们讨论了具身智能的最新进展。
框架系统需要优化，目前性能不够好。  ← "股价"→"框架" corrected by AI
```

**Learned pattern detected:**
```
✓ Detected: "股价" → "框架" (confidence: 85%, count: 1)
  Run --review-learned after 2 more occurrences to approve
```

## Core Workflow

Three-stage pipeline stores corrections in `~/.transcript-fixer/corrections.db`:

1. **Initialize** (first time): `uv run scripts/fix_transcription.py --init`
2. **Add domain corrections**: `--add "错误词" "正确词" --domain <domain>`
3. **Process transcript**: `--input file.md --stage 3`
4. **Review learned patterns**: `--review-learned` and `--approve` high-confidence suggestions

**Stages**: Dictionary (instant, free) → AI via GLM API (parallel) → Full pipeline
**Domains**: `general`, `embodied_ai`, `finance`, `medical`, or custom names including Chinese (e.g., `火星加速器`, `具身智能`)
**Learning**: Patterns appearing ≥3 times at ≥80% confidence move from AI to dictionary

See `references/workflow_guide.md` for detailed workflows, `references/script_parameters.md` for complete CLI reference, and `references/team_collaboration.md` for collaboration patterns.

## Critical Workflow: Dictionary Iteration

**MUST save corrections after each fix.** This is the skill's core value.

After fixing errors manually, immediately save to dictionary:
```bash
uv run scripts/fix_transcription.py --add "错误词" "正确词" --domain general
```

See `references/iteration_workflow.md` for complete iteration guide with checklist.

## AI Fallback Strategy

When GLM API is unavailable (503, network issues), the script outputs `[CLAUDE_FALLBACK]` marker.

Claude Code should then:
1. Analyze the text directly for ASR errors
2. Fix using Edit tool
3. **MUST save corrections to dictionary** with `--add`

## Database Operations

**MUST read `references/database_schema.md` before any database operations.**

Quick reference:
```bash
# View all corrections
sqlite3 ~/.transcript-fixer/corrections.db "SELECT * FROM active_corrections;"

# Check schema version
sqlite3 ~/.transcript-fixer/corrections.db "SELECT value FROM system_config WHERE key='schema_version';"
```

## Stages

| Stage | Description | Speed | Cost |
|-------|-------------|-------|------|
| 1 | Dictionary only | Instant | Free |
| 2 | AI only | ~10s | API calls |
| 3 | Full pipeline | ~10s | API calls |

## Bundled Resources

**Scripts:**
- `ensure_deps.py` - Initialize shared virtual environment (run once, optional)
- `fix_transcript_enhanced.py` - Enhanced wrapper (recommended for interactive use)
- `fix_transcription.py` - Core CLI (for automation)
- `generate_word_diff.py` - Generate word-level diff HTML for reviewing corrections
- `examples/bulk_import.py` - Bulk import example

**References** (load as needed):
- **Critical**: `database_schema.md` (read before DB operations), `iteration_workflow.md` (dictionary iteration best practices)
- Getting started: `installation_setup.md`, `glm_api_setup.md`, `workflow_guide.md`
- Daily use: `quick_reference.md`, `script_parameters.md`, `dictionary_guide.md`
- Advanced: `sql_queries.md`, `file_formats.md`, `architecture.md`, `best_practices.md`
- Operations: `troubleshooting.md`, `team_collaboration.md`

## Troubleshooting

Verify setup health with `uv run scripts/fix_transcription.py --validate`. Common issues:
- Missing database → Run `--init`
- Missing API key → `export GLM_API_KEY="<key>"` (obtain from https://open.bigmodel.cn/)
- Permission errors → Check `~/.transcript-fixer/` ownership

See `references/troubleshooting.md` for detailed error resolution and `references/glm_api_setup.md` for API configuration.

Overview

This skill corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using a hybrid of dictionary rules and AI-driven fixes. It learns recurring patterns to build a personalized correction database and supports mixed-language scenarios and homophone issues. The workflow is designed for repeatable, team-shared correction knowledge and fast review with visual diffs.

How this skill works

The pipeline runs in three stages: dictionary-based substitutions, AI corrections via a GLM-compatible API, and a full combined pass that learns new patterns. Corrections are stored in a local SQLite database and high-confidence recurring fixes are promoted into the dictionary. Enhanced wrappers provide automatic API key detection, output organization, and generation of HTML word-level diffs for quick review.

When to use it

Cleaning ASR/STT output from meetings, interviews, or lectures before distribution or analysis
Fixing homophone errors and mixed English/Chinese segments in transcripts
Building domain-specific terminology dictionaries for finance, medical, or technical teams
Automating repeated corrections across a corpus of transcripts
Collaborating with a team to share and approve correction rules

Best practices

Always initialize the local corrections database before processing transcripts
Save every manual correction immediately to the dictionary to accelerate learning
Run the full three-stage pipeline for best accuracy: dictionary → AI → combined
Use the enhanced wrapper for visual HTML diffs during interactive reviews
Review learned suggestions after 2–5 occurrences before approving into the dictionary

Example use cases

Post-process meeting transcripts to replace consistent ASR misrecognitions with correct company or product names
Clean lecture transcripts that mix English technical terms with another language to ensure searchable text
Create a shared correction database for a remote team handling customer interview transcriptions
Batch-correct a corpus of historical transcripts using domain dictionaries and AI fallback

FAQ

What if the AI API is unavailable during processing?

The script emits a fallback marker and you should run local edits, then save those corrections to the dictionary so the issue won't recur.

How does a suggested pattern become a permanent dictionary entry?

Patterns that appear multiple times with high confidence (configurable, typically ≥3 occurrences and ≥80% confidence) are flagged for review and can be approved into the dictionary.