home / skills / jeremylongshore / claude-code-plugins-plus-skills / speak-core-workflow-b

This skill guides users through phoneme-level pronunciation training with adaptive drills, phoneme analysis, and targeted feedback to improve accent.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill speak-core-workflow-b

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
9.3 KB
---
name: speak-core-workflow-b
description: |
  Execute Speak secondary workflow: Pronunciation Training with detailed phoneme analysis.
  Use when implementing pronunciation drills, speech scoring,
  or targeted pronunciation improvement features.
  Trigger with phrases like "speak pronunciation training",
  "speak speech scoring", "secondary speak workflow".
allowed-tools: Read, Write, Edit, Bash(npm:*), Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Speak Core Workflow B: Pronunciation Training

## Overview
Secondary workflow for Speak: Detailed pronunciation training with phoneme-level analysis and targeted practice.

## Prerequisites
- Completed `speak-install-auth` setup
- Familiarity with `speak-core-workflow-a`
- Valid API credentials configured
- High-quality audio input capabilities

## Instructions

### Step 1: Initialize Pronunciation Session
```typescript
// src/workflows/pronunciation-training.ts
import {
  SpeakClient,
  PronunciationTrainer,
  PhonemeAnalyzer,
} from '@speak/language-sdk';

interface PronunciationConfig {
  targetLanguage: string;
  difficulty: 'beginner' | 'intermediate' | 'advanced';
  focusPhonemes?: string[]; // Specific sounds to practice
  category?: 'vowels' | 'consonants' | 'tones' | 'all';
}

async function initializePronunciationTraining(
  client: SpeakClient,
  config: PronunciationConfig
): Promise<PronunciationTrainer> {
  const trainer = new PronunciationTrainer(client, {
    language: config.targetLanguage,
    difficulty: config.difficulty,
    adaptiveMode: true, // Adjusts based on user performance
  });

  // Pre-load phoneme models for target language
  await trainer.initialize();

  // Set focus areas if specified
  if (config.focusPhonemes) {
    trainer.setFocusPhonemes(config.focusPhonemes);
  }

  return trainer;
}
```

### Step 2: Implement Drill Session
```typescript
interface DrillItem {
  id: string;
  text: string;
  romanization?: string; // For non-Latin scripts
  translation: string;
  audioUrl: string;
  targetPhonemes: string[];
  difficulty: number;
}

interface DrillResult {
  item: DrillItem;
  userAudio: ArrayBuffer;
  scores: PronunciationScores;
  phonemeDetails: PhonemeResult[];
  feedback: string[];
}

async function runPronunciationDrill(
  trainer: PronunciationTrainer,
  drillCount: number = 10
): Promise<DrillSession> {
  const session = await trainer.startDrillSession({
    itemCount: drillCount,
    repeatOnMistake: true,
    minScore: 70, // Repeat until 70% or better
  });

  const results: DrillResult[] = [];

  while (!session.isComplete) {
    // Get next drill item
    const item = await session.getNextItem();

    console.log('\n--- Pronunciation Drill ---');
    console.log(`Say: "${item.text}"`);
    if (item.romanization) {
      console.log(`Romanization: ${item.romanization}`);
    }
    console.log(`Meaning: ${item.translation}`);
    console.log('Listen to native pronunciation...');

    // Play native audio for reference
    await playAudio(item.audioUrl);

    // Record user attempt
    const userAudio = await recordUserAudio();

    // Analyze pronunciation
    const result = await session.submitAttempt({
      itemId: item.id,
      audioData: userAudio,
    });

    results.push({
      item,
      userAudio,
      scores: result.scores,
      phonemeDetails: result.phonemeDetails,
      feedback: result.feedback,
    });

    // Display detailed feedback
    displayPronunciationFeedback(result);

    // Check if need to repeat
    if (result.scores.overall < 70 && session.shouldRepeat) {
      console.log('\nLet\'s try that one again...');
    }
  }

  return session.getSummary();
}
```

### Step 3: Phoneme-Level Analysis
```typescript
interface PhonemeResult {
  phoneme: string;
  expected: string;
  actual: string;
  score: number;
  issues: PhonemeIssue[];
  visualGuide?: string; // Mouth position diagram URL
}

interface PhonemeIssue {
  type: 'substitution' | 'omission' | 'addition' | 'distortion';
  description: string;
  tip: string;
}

function displayPronunciationFeedback(result: DrillResult) {
  console.log(`\nšŸ“Š Pronunciation Score: ${result.scores.overall}/100`);
  console.log(`   Accuracy: ${result.scores.accuracy}/100`);
  console.log(`   Fluency: ${result.scores.fluency}/100`);
  console.log(`   Intonation: ${result.scores.intonation}/100`);

  // Show problem phonemes
  const problemPhonemes = result.phonemeDetails.filter(p => p.score < 70);
  if (problemPhonemes.length > 0) {
    console.log('\nšŸ” Phonemes to improve:');
    for (const p of problemPhonemes) {
      console.log(`   [${p.phoneme}] Score: ${p.score}/100`);
      for (const issue of p.issues) {
        console.log(`      āš ļø ${issue.type}: ${issue.description}`);
        console.log(`      šŸ’” Tip: ${issue.tip}`);
      }
    }
  }

  // Show overall feedback
  if (result.feedback.length > 0) {
    console.log('\nšŸ’¬ Feedback:');
    result.feedback.forEach(f => console.log(`   • ${f}`));
  }
}
```

### Step 4: Adaptive Practice Generation
```typescript
interface WeaknessAnalysis {
  phoneme: string;
  averageScore: number;
  attemptCount: number;
  trend: 'improving' | 'stable' | 'declining';
  suggestedDrills: DrillItem[];
}

async function generateAdaptivePractice(
  trainer: PronunciationTrainer,
  userHistory: DrillResult[]
): Promise<AdaptivePracticeSession> {
  // Analyze user's weaknesses
  const weaknesses = analyzeWeaknesses(userHistory);

  // Generate targeted practice
  const adaptiveSession = await trainer.createAdaptiveSession({
    targetWeaknesses: weaknesses.map(w => w.phoneme),
    intensity: 'focused', // or 'mixed' for variety
    maxDuration: 15 * 60 * 1000, // 15 minutes
  });

  console.log('\nšŸŽÆ Adaptive Practice Plan:');
  console.log(`Focus areas: ${weaknesses.map(w => w.phoneme).join(', ')}`);

  return adaptiveSession;
}

function analyzeWeaknesses(results: DrillResult[]): WeaknessAnalysis[] {
  const phonemeStats = new Map<string, number[]>();

  // Collect scores by phoneme
  for (const result of results) {
    for (const p of result.phonemeDetails) {
      if (!phonemeStats.has(p.phoneme)) {
        phonemeStats.set(p.phoneme, []);
      }
      phonemeStats.get(p.phoneme)!.push(p.score);
    }
  }

  // Identify weaknesses (average < 75)
  const weaknesses: WeaknessAnalysis[] = [];
  for (const [phoneme, scores] of phonemeStats) {
    const avg = scores.reduce((a, b) => a + b, 0) / scores.length;
    if (avg < 75) {
      weaknesses.push({
        phoneme,
        averageScore: avg,
        attemptCount: scores.length,
        trend: calculateTrend(scores),
        suggestedDrills: [], // Populated by trainer
      });
    }
  }

  return weaknesses.sort((a, b) => a.averageScore - b.averageScore);
}
```

## Complete Workflow Example

```typescript
async function pronunciationTrainingWorkflow() {
  const client = getSpeakClient();

  // Configure for Korean pronunciation (challenging for English speakers)
  const config: PronunciationConfig = {
    targetLanguage: 'ko',
    difficulty: 'intermediate',
    focusPhonemes: ['愱', '態', 'ㄲ'], // Korean aspirated/tense consonants
    category: 'consonants',
  };

  console.log('Starting pronunciation training...');
  console.log(`Language: ${config.targetLanguage}`);
  console.log(`Focus: ${config.focusPhonemes?.join(', ') || 'General'}`);

  // Initialize trainer
  const trainer = await initializePronunciationTraining(client, config);

  // Run initial assessment
  console.log('\nšŸ“ Initial Assessment...');
  const assessment = await trainer.runAssessment();
  console.log(`Baseline score: ${assessment.overallScore}/100`);

  // Run pronunciation drills
  const drillResults = await runPronunciationDrill(trainer, 10);

  // Generate adaptive practice based on results
  const adaptiveSession = await generateAdaptivePractice(
    trainer,
    drillResults.results
  );

  // Run adaptive practice
  await adaptiveSession.run();

  // Final summary
  const summary = await trainer.getSessionSummary();
  console.log('\n========== Training Complete ==========');
  console.log(`Items practiced: ${summary.totalItems}`);
  console.log(`Average score: ${summary.averageScore}/100`);
  console.log(`Improvement: +${summary.improvement}%`);

  return summary;
}
```

## Workflow Comparison

| Aspect | Workflow A (Conversation) | Workflow B (Pronunciation) |
|--------|---------------------------|----------------------------|
| Primary Focus | Communication | Accuracy |
| Feedback Type | Holistic | Phoneme-level |
| Session Style | Free-form dialogue | Structured drills |
| Pacing | User-driven | System-driven |
| Best For | Fluency building | Accent reduction |

## Output
- Detailed phoneme-level scores
- Visual pronunciation guides
- Adaptive practice recommendations
- Progress tracking over time
- Weakness identification

## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Audio Too Short | Brief recording | Minimum 0.5s audio |
| Background Noise | Poor recording conditions | Prompt for quieter environment |
| Phoneme Not Detected | Unclear speech | Slow down and articulate |
| Model Loading Failed | Network issue | Retry with fallback |

## Resources
- [Speak Pronunciation API](https://developer.speak.com/api/pronunciation)
- [Phoneme Reference](https://developer.speak.com/docs/phonemes)
- [Audio Recording Best Practices](https://developer.speak.com/docs/audio-quality)

## Next Steps
For common errors, see `speak-common-errors`.

Overview

This skill executes Speak secondary workflow B: Pronunciation Training with detailed phoneme-level analysis and adaptive practice. It guides initialization, drill sessions, phoneme diagnostics, and targeted practice generation for measurable pronunciation improvement. Use it to implement scoring, feedback, and progress tracking in language learning apps.

How this skill works

The skill initializes a PronunciationTrainer, preloads phoneme models, and configures focus areas and difficulty. It runs structured drill sessions where native audio is played, user audio is recorded, and each attempt is scored with phoneme-level breakdowns. Results feed an analyzer that identifies weak phonemes and generates adaptive practice sessions and targeted drills.

When to use it

  • Building pronunciation drills with phoneme-level feedback
  • Adding speech scoring or assessment features
  • Creating adaptive practice plans from user history
  • Targeted accent reduction or articulation training
  • Producing visual mouth-position guides and remediation tips

Best practices

  • Ensure high-quality microphone input and quiet recording environment
  • Start with an initial assessment to set baseline scores and focus areas
  • Set sensible repetition thresholds (e.g., repeat until ≄70% accuracy)
  • Use adaptive mode to prioritize user's weakest phonemes
  • Provide short, focused sessions (10–15 minutes) for better retention
  • Offer visual guides and concrete articulation tips alongside scores

Example use cases

  • Language learning app: run 10-item drills, show phoneme issues, then auto-generate a 15-minute adaptive practice
  • Pronunciation assessment: baseline test → drill session → export phoneme-level report for tutors
  • Accent-reduction feature: focus on specific consonants or vowels and track trend over time
  • Automated speech scoring: integrate overall/accuracy/fluency/intonation metrics into user profile
  • Tutoring dashboard: surface problem phonemes with tips and mouth-position diagrams for targeted lessons

FAQ

What input quality is required?

Use a clear microphone in a quiet room; recordings under 0.5s or heavy background noise can fail analysis.

How does adaptive practice choose drills?

It analyzes phoneme scores across attempts, ranks weaknesses (avg <75), and generates sessions that target the lowest-scoring phonemes with focused drills.