home / skills / jeremylongshore / claude-code-plugins-plus-skills / speak-prod-checklist

speak-prod-checklist skill

/plugins/saas-packs/speak-pack/skills/speak-prod-checklist

This skill guides and validates the Speak production deployment and rollback process, ensuring safe go-live and reliable post-launch operations.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill speak-prod-checklist

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.9 KB

---
name: speak-prod-checklist
description: |
  Execute Speak production deployment checklist and rollback procedures.
  Use when deploying Speak integrations to production, preparing for launch,
  or implementing go-live procedures for language learning features.
  Trigger with phrases like "speak production", "deploy speak",
  "speak go-live", "speak launch checklist".
allowed-tools: Read, Bash(kubectl:*), Bash(curl:*), Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Speak Production Checklist

## Overview
Complete checklist for deploying Speak language learning integrations to production.

## Prerequisites
- Staging environment tested and verified
- Production API keys available
- Deployment pipeline configured
- Monitoring and alerting ready
- Audio infrastructure tested

## Instructions

### Step 1: Pre-Deployment Configuration
- [ ] Production API keys in secure vault
- [ ] Environment variables set in deployment platform
- [ ] API key scopes are minimal (least privilege)
- [ ] Webhook endpoints configured with HTTPS
- [ ] Webhook secrets stored securely
- [ ] Audio storage configured with encryption

### Step 2: Code Quality Verification
- [ ] All tests passing (`npm test`)
- [ ] No hardcoded credentials
- [ ] Error handling covers all Speak error types
- [ ] Rate limiting/backoff implemented
- [ ] Logging is production-appropriate (no PII)
- [ ] Audio handling follows privacy guidelines

### Step 3: Language Learning Feature Checklist
- [ ] All target languages tested
- [ ] Speech recognition works in all browsers
- [ ] Pronunciation scoring accurate
- [ ] AI tutor responses appropriate
- [ ] Session timeouts handled gracefully
- [ ] Progress tracking persists correctly

### Step 4: Infrastructure Setup
- [ ] Health check endpoint includes Speak connectivity
- [ ] Monitoring/alerting configured for:
  - [ ] API latency
  - [ ] Speech recognition success rate
  - [ ] Session completion rate
  - [ ] Error rates by type
- [ ] Circuit breaker pattern implemented
- [ ] Graceful degradation configured
- [ ] CDN configured for audio assets

### Step 5: Documentation Requirements
- [ ] Incident runbook created
- [ ] Key rotation procedure documented
- [ ] Rollback procedure documented
- [ ] On-call escalation path defined
- [ ] User support FAQ for common issues

### Step 6: Deploy with Gradual Rollout

```bash
# Pre-flight checks
echo "=== Speak Production Pre-flight ==="

# Check staging health
curl -f https://staging.example.com/health | jq '.services.speak'

# Check Speak service status
curl -s https://status.speak.com/api/status | jq '.status'

# Verify production credentials work
curl -X POST https://api.speak.com/v1/health \
  -H "Authorization: Bearer ${SPEAK_API_KEY_PROD}" \
  -H "X-App-ID: ${SPEAK_APP_ID_PROD}" | jq

echo "=== Starting Deployment ==="

# Gradual rollout - start with canary (10%)
kubectl apply -f k8s/production.yaml
kubectl set image deployment/speak-integration app=image:new --record
kubectl rollout pause deployment/speak-integration

echo "Canary deployed. Monitoring for 10 minutes..."
sleep 600

# Check canary metrics
echo "Checking error rates..."
curl -s "localhost:9090/api/v1/query?query=rate(speak_errors_total[5m])" | jq

# Check lesson completion rate
curl -s "localhost:9090/api/v1/query?query=speak_lesson_completion_rate[5m]" | jq

# If healthy, continue rollout to 50%
echo "Canary healthy. Continuing to 50%..."
kubectl rollout resume deployment/speak-integration
kubectl rollout pause deployment/speak-integration
sleep 300

# Complete rollout to 100%
echo "50% healthy. Completing rollout..."
kubectl rollout resume deployment/speak-integration
kubectl rollout status deployment/speak-integration

echo "=== Deployment Complete ==="
```

## Health Check Implementation

```typescript
// api/health.ts
interface SpeakHealthStatus {
  connected: boolean;
  latencyMs: number;
  speechRecognition: boolean;
  tutorAvailable: boolean;
}

async function checkSpeakHealth(): Promise<SpeakHealthStatus> {
  const start = Date.now();

  try {
    // Test basic connectivity
    await speakClient.health.check();

    // Test speech recognition (with sample audio)
    const speechOk = await testSpeechRecognition();

    // Test tutor availability
    const tutorOk = await testTutorConnection();

    return {
      connected: true,
      latencyMs: Date.now() - start,
      speechRecognition: speechOk,
      tutorAvailable: tutorOk,
    };
  } catch (error) {
    return {
      connected: false,
      latencyMs: Date.now() - start,
      speechRecognition: false,
      tutorAvailable: false,
    };
  }
}

app.get('/health', async (req, res) => {
  const speakStatus = await checkSpeakHealth();

  const isHealthy = speakStatus.connected &&
    speakStatus.speechRecognition &&
    speakStatus.tutorAvailable;

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? 'healthy' : 'degraded',
    services: {
      speak: speakStatus,
    },
    timestamp: new Date().toISOString(),
  });
});
```

## Rollback Procedure

```bash
#!/bin/bash
# rollback-speak.sh

echo "=== Emergency Rollback ==="

# Immediate rollback
kubectl rollout undo deployment/speak-integration
kubectl rollout status deployment/speak-integration

# Verify rollback
curl -f https://api.yourapp.com/health | jq

# If Speak completely down, enable fallback mode
if [ "$1" == "--fallback" ]; then
  kubectl set env deployment/speak-integration SPEAK_FALLBACK_MODE=true
  echo "Fallback mode enabled - offline lessons available"
fi

echo "=== Rollback Complete ==="
```

## Alert Configuration

| Alert | Condition | Severity |
|-------|-----------|----------|
| Speak API Down | Health check fails 3x | P1 |
| High Error Rate | Error rate > 5% | P2 |
| Speech Recognition Failing | Success rate < 90% | P2 |
| High Latency | p95 > 3000ms | P2 |
| Auth Failures | 401/403 errors > 0 | P1 |
| Session Abandonment | Abandon rate > 20% | P3 |

## Production Monitoring Dashboard

```yaml
# grafana-speak-dashboard.yaml
panels:
  - title: "Lesson Sessions"
    query: "sum(speak_sessions_active)"

  - title: "Speech Recognition Success Rate"
    query: "rate(speak_speech_success_total[5m]) / rate(speak_speech_total[5m]) * 100"

  - title: "Average Pronunciation Score"
    query: "avg(speak_pronunciation_score)"

  - title: "API Latency (p95)"
    query: "histogram_quantile(0.95, rate(speak_api_latency_bucket[5m]))"

  - title: "Error Rate by Type"
    query: "sum by (error_type) (rate(speak_errors_total[5m]))"

  - title: "Lesson Completion Rate"
    query: "rate(speak_lessons_completed[5m]) / rate(speak_lessons_started[5m]) * 100"
```

## Output
- Deployed Speak integration
- Health checks passing
- Monitoring active
- Rollback procedure documented and tested
- Alerting configured

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Health check fails | Speak service down | Enable fallback mode |
| High latency | Audio processing slow | Scale audio workers |
| Session failures | API key issues | Verify credentials |
| Low completion rate | UX issues | Review user feedback |

## Examples

### Smoke Test Suite
```typescript
async function productionSmokeTest(): Promise<TestResults> {
  const tests = [
    { name: 'API Health', fn: testApiHealth },
    { name: 'Speech Recognition', fn: testSpeechRecognition },
    { name: 'AI Tutor', fn: testTutorResponse },
    { name: 'Session Lifecycle', fn: testSessionLifecycle },
    { name: 'Pronunciation Scoring', fn: testPronunciationScoring },
  ];

  const results = [];
  for (const test of tests) {
    try {
      await test.fn();
      results.push({ name: test.name, passed: true });
    } catch (error) {
      results.push({ name: test.name, passed: false, error });
    }
  }

  return results;
}
```

## Resources
- [Speak Status](https://status.speak.com)
- [Speak Support](https://support.speak.com)
- [Speak Production Guide](https://developer.speak.com/docs/production)

## Next Steps
For version upgrades, see `speak-upgrade-migration`.

Overview

This skill executes the Speak production deployment checklist and rollback procedures for language-learning integrations. It guides deploys, health checks, monitoring setup, and emergency rollback so launches are safe and auditable. Use it to streamline go-live and post-launch validation for Speak-powered features.

How this skill works

The skill runs a structured pre-deployment sequence: verify staging, confirm production secrets, validate audio and API connectivity, and perform a gradual rollout with canary checks. It automates health checks, verifies speech recognition and tutor availability, monitors key metrics, and provides a scripted rollback/fallback flow when failures occur. It also outputs required documentation and alert rules for on-call teams.

When to use it

Deploying Speak integrations to production
Preparing a new Speak-powered feature for launch
Running pre-flight and post-deploy smoke tests
Implementing go-live monitoring and alerting
Executing emergency rollback or enabling fallback mode

Best practices

Keep production API keys in a secure vault and use least-privilege scopes
Run full staging tests including speech recognition and pronunciation scoring before production
Use gradual rollout (canary → 50% → 100%) and monitor metrics between steps
Ensure health endpoint verifies Speak connectivity, recognition, and tutor availability
Configure alerts for API down, high error rate, low speech success rate, and auth failures

Example use cases

Performing a controlled canary deployment for a new Speak lesson pipeline
Running the smoke test suite to validate production readiness after a release
Triggering emergency rollback when production speech recognition drops below threshold
Setting up Grafana panels and Prometheus queries for Speak-specific metrics
Documenting incident runbooks and key rotation before handing over to on-call

FAQ

What metrics should I watch during rollout?

Track API latency (p95), error rate, speech recognition success rate, lesson completion rates, and session abandonment.

When should I enable fallback mode?

Enable fallback when Speak connectivity or recognition is failing broadly and rollback to a previous stable version hasn’t restored service.