home / skills / jeremylongshore / claude-code-plugins-plus-skills / speak-prod-checklist

This skill guides and validates the Speak production deployment and rollback process, ensuring safe go-live and reliable post-launch operations.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill speak-prod-checklist

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
7.9 KB
---
name: speak-prod-checklist
description: |
  Execute Speak production deployment checklist and rollback procedures.
  Use when deploying Speak integrations to production, preparing for launch,
  or implementing go-live procedures for language learning features.
  Trigger with phrases like "speak production", "deploy speak",
  "speak go-live", "speak launch checklist".
allowed-tools: Read, Bash(kubectl:*), Bash(curl:*), Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Speak Production Checklist

## Overview
Complete checklist for deploying Speak language learning integrations to production.

## Prerequisites
- Staging environment tested and verified
- Production API keys available
- Deployment pipeline configured
- Monitoring and alerting ready
- Audio infrastructure tested

## Instructions

### Step 1: Pre-Deployment Configuration
- [ ] Production API keys in secure vault
- [ ] Environment variables set in deployment platform
- [ ] API key scopes are minimal (least privilege)
- [ ] Webhook endpoints configured with HTTPS
- [ ] Webhook secrets stored securely
- [ ] Audio storage configured with encryption

### Step 2: Code Quality Verification
- [ ] All tests passing (`npm test`)
- [ ] No hardcoded credentials
- [ ] Error handling covers all Speak error types
- [ ] Rate limiting/backoff implemented
- [ ] Logging is production-appropriate (no PII)
- [ ] Audio handling follows privacy guidelines

### Step 3: Language Learning Feature Checklist
- [ ] All target languages tested
- [ ] Speech recognition works in all browsers
- [ ] Pronunciation scoring accurate
- [ ] AI tutor responses appropriate
- [ ] Session timeouts handled gracefully
- [ ] Progress tracking persists correctly

### Step 4: Infrastructure Setup
- [ ] Health check endpoint includes Speak connectivity
- [ ] Monitoring/alerting configured for:
  - [ ] API latency
  - [ ] Speech recognition success rate
  - [ ] Session completion rate
  - [ ] Error rates by type
- [ ] Circuit breaker pattern implemented
- [ ] Graceful degradation configured
- [ ] CDN configured for audio assets

### Step 5: Documentation Requirements
- [ ] Incident runbook created
- [ ] Key rotation procedure documented
- [ ] Rollback procedure documented
- [ ] On-call escalation path defined
- [ ] User support FAQ for common issues

### Step 6: Deploy with Gradual Rollout

```bash
# Pre-flight checks
echo "=== Speak Production Pre-flight ==="

# Check staging health
curl -f https://staging.example.com/health | jq '.services.speak'

# Check Speak service status
curl -s https://status.speak.com/api/status | jq '.status'

# Verify production credentials work
curl -X POST https://api.speak.com/v1/health \
  -H "Authorization: Bearer ${SPEAK_API_KEY_PROD}" \
  -H "X-App-ID: ${SPEAK_APP_ID_PROD}" | jq

echo "=== Starting Deployment ==="

# Gradual rollout - start with canary (10%)
kubectl apply -f k8s/production.yaml
kubectl set image deployment/speak-integration app=image:new --record
kubectl rollout pause deployment/speak-integration

echo "Canary deployed. Monitoring for 10 minutes..."
sleep 600

# Check canary metrics
echo "Checking error rates..."
curl -s "localhost:9090/api/v1/query?query=rate(speak_errors_total[5m])" | jq

# Check lesson completion rate
curl -s "localhost:9090/api/v1/query?query=speak_lesson_completion_rate[5m]" | jq

# If healthy, continue rollout to 50%
echo "Canary healthy. Continuing to 50%..."
kubectl rollout resume deployment/speak-integration
kubectl rollout pause deployment/speak-integration
sleep 300

# Complete rollout to 100%
echo "50% healthy. Completing rollout..."
kubectl rollout resume deployment/speak-integration
kubectl rollout status deployment/speak-integration

echo "=== Deployment Complete ==="
```

## Health Check Implementation

```typescript
// api/health.ts
interface SpeakHealthStatus {
  connected: boolean;
  latencyMs: number;
  speechRecognition: boolean;
  tutorAvailable: boolean;
}

async function checkSpeakHealth(): Promise<SpeakHealthStatus> {
  const start = Date.now();

  try {
    // Test basic connectivity
    await speakClient.health.check();

    // Test speech recognition (with sample audio)
    const speechOk = await testSpeechRecognition();

    // Test tutor availability
    const tutorOk = await testTutorConnection();

    return {
      connected: true,
      latencyMs: Date.now() - start,
      speechRecognition: speechOk,
      tutorAvailable: tutorOk,
    };
  } catch (error) {
    return {
      connected: false,
      latencyMs: Date.now() - start,
      speechRecognition: false,
      tutorAvailable: false,
    };
  }
}

app.get('/health', async (req, res) => {
  const speakStatus = await checkSpeakHealth();

  const isHealthy = speakStatus.connected &&
    speakStatus.speechRecognition &&
    speakStatus.tutorAvailable;

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? 'healthy' : 'degraded',
    services: {
      speak: speakStatus,
    },
    timestamp: new Date().toISOString(),
  });
});
```

## Rollback Procedure

```bash
#!/bin/bash
# rollback-speak.sh

echo "=== Emergency Rollback ==="

# Immediate rollback
kubectl rollout undo deployment/speak-integration
kubectl rollout status deployment/speak-integration

# Verify rollback
curl -f https://api.yourapp.com/health | jq

# If Speak completely down, enable fallback mode
if [ "$1" == "--fallback" ]; then
  kubectl set env deployment/speak-integration SPEAK_FALLBACK_MODE=true
  echo "Fallback mode enabled - offline lessons available"
fi

echo "=== Rollback Complete ==="
```

## Alert Configuration

| Alert | Condition | Severity |
|-------|-----------|----------|
| Speak API Down | Health check fails 3x | P1 |
| High Error Rate | Error rate > 5% | P2 |
| Speech Recognition Failing | Success rate < 90% | P2 |
| High Latency | p95 > 3000ms | P2 |
| Auth Failures | 401/403 errors > 0 | P1 |
| Session Abandonment | Abandon rate > 20% | P3 |

## Production Monitoring Dashboard

```yaml
# grafana-speak-dashboard.yaml
panels:
  - title: "Lesson Sessions"
    query: "sum(speak_sessions_active)"

  - title: "Speech Recognition Success Rate"
    query: "rate(speak_speech_success_total[5m]) / rate(speak_speech_total[5m]) * 100"

  - title: "Average Pronunciation Score"
    query: "avg(speak_pronunciation_score)"

  - title: "API Latency (p95)"
    query: "histogram_quantile(0.95, rate(speak_api_latency_bucket[5m]))"

  - title: "Error Rate by Type"
    query: "sum by (error_type) (rate(speak_errors_total[5m]))"

  - title: "Lesson Completion Rate"
    query: "rate(speak_lessons_completed[5m]) / rate(speak_lessons_started[5m]) * 100"
```

## Output
- Deployed Speak integration
- Health checks passing
- Monitoring active
- Rollback procedure documented and tested
- Alerting configured

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Health check fails | Speak service down | Enable fallback mode |
| High latency | Audio processing slow | Scale audio workers |
| Session failures | API key issues | Verify credentials |
| Low completion rate | UX issues | Review user feedback |

## Examples

### Smoke Test Suite
```typescript
async function productionSmokeTest(): Promise<TestResults> {
  const tests = [
    { name: 'API Health', fn: testApiHealth },
    { name: 'Speech Recognition', fn: testSpeechRecognition },
    { name: 'AI Tutor', fn: testTutorResponse },
    { name: 'Session Lifecycle', fn: testSessionLifecycle },
    { name: 'Pronunciation Scoring', fn: testPronunciationScoring },
  ];

  const results = [];
  for (const test of tests) {
    try {
      await test.fn();
      results.push({ name: test.name, passed: true });
    } catch (error) {
      results.push({ name: test.name, passed: false, error });
    }
  }

  return results;
}
```

## Resources
- [Speak Status](https://status.speak.com)
- [Speak Support](https://support.speak.com)
- [Speak Production Guide](https://developer.speak.com/docs/production)

## Next Steps
For version upgrades, see `speak-upgrade-migration`.

Overview

This skill executes the Speak production deployment checklist and rollback procedures for language-learning integrations. It guides deploys, health checks, monitoring setup, and emergency rollback so launches are safe and auditable. Use it to streamline go-live and post-launch validation for Speak-powered features.

How this skill works

The skill runs a structured pre-deployment sequence: verify staging, confirm production secrets, validate audio and API connectivity, and perform a gradual rollout with canary checks. It automates health checks, verifies speech recognition and tutor availability, monitors key metrics, and provides a scripted rollback/fallback flow when failures occur. It also outputs required documentation and alert rules for on-call teams.

When to use it

  • Deploying Speak integrations to production
  • Preparing a new Speak-powered feature for launch
  • Running pre-flight and post-deploy smoke tests
  • Implementing go-live monitoring and alerting
  • Executing emergency rollback or enabling fallback mode

Best practices

  • Keep production API keys in a secure vault and use least-privilege scopes
  • Run full staging tests including speech recognition and pronunciation scoring before production
  • Use gradual rollout (canary → 50% → 100%) and monitor metrics between steps
  • Ensure health endpoint verifies Speak connectivity, recognition, and tutor availability
  • Configure alerts for API down, high error rate, low speech success rate, and auth failures

Example use cases

  • Performing a controlled canary deployment for a new Speak lesson pipeline
  • Running the smoke test suite to validate production readiness after a release
  • Triggering emergency rollback when production speech recognition drops below threshold
  • Setting up Grafana panels and Prometheus queries for Speak-specific metrics
  • Documenting incident runbooks and key rotation before handing over to on-call

FAQ

What metrics should I watch during rollout?

Track API latency (p95), error rate, speech recognition success rate, lesson completion rates, and session abandonment.

When should I enable fallback mode?

Enable fallback when Speak connectivity or recognition is failing broadly and rollback to a previous stable version hasn’t restored service.