home / skills / jeremylongshore / claude-code-plugins-plus-skills / speak-rate-limits

This skill helps you implement robust speak rate limits with exponential backoff, session aware throttling, and idempotent responses for reliable APIs.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill speak-rate-limits

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
9.7 KB
---
name: speak-rate-limits
description: |
  Implement Speak rate limiting, backoff, and idempotency patterns for language learning APIs.
  Use when handling rate limit errors, implementing retry logic,
  or optimizing API request throughput for Speak integrations.
  Trigger with phrases like "speak rate limit", "speak throttling",
  "speak 429", "speak retry", "speak backoff".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Speak Rate Limits

## Overview
Handle Speak rate limits gracefully with exponential backoff and session management for language learning applications.

## Prerequisites
- Speak SDK installed
- Understanding of async/await patterns
- Access to rate limit headers

## Instructions

### Step 1: Understand Rate Limit Tiers

| Tier | Sessions/hour | API Calls/min | Audio Processing/min | Burst |
|------|---------------|---------------|---------------------|-------|
| Free | 10 | 60 | 20 | 5 |
| Premium | 100 | 300 | 100 | 25 |
| Education | 500 | 1000 | 300 | 50 |
| Enterprise | Unlimited | Custom | Custom | Custom |

### Step 2: Implement Exponential Backoff with Jitter

```typescript
interface BackoffConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterMs: number;
}

async function withExponentialBackoff<T>(
  operation: () => Promise<T>,
  config: BackoffConfig = {
    maxRetries: 5,
    baseDelayMs: 1000,
    maxDelayMs: 32000,
    jitterMs: 500,
  }
): Promise<T> {
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error: any) {
      if (attempt === config.maxRetries) throw error;

      const status = error.status || error.response?.status;
      const isRetryable = status === 429 || (status >= 500 && status < 600);

      if (!isRetryable) throw error;

      // Check for Retry-After header
      const retryAfter = error.headers?.['retry-after'];
      let delay: number;

      if (retryAfter) {
        delay = parseInt(retryAfter) * 1000;
      } else {
        // Exponential backoff with jitter
        const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
        const jitter = Math.random() * config.jitterMs;
        delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);
      }

      console.log(`Rate limited. Retrying in ${delay.toFixed(0)}ms...`);
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error('Unreachable');
}
```

### Step 3: Session-Aware Rate Limiting

```typescript
// Speak uses session-based rate limits for lesson interactions
class SessionRateLimiter {
  private sessionCounts: Map<string, { count: number; windowStart: Date }> = new Map();
  private readonly maxSessionsPerHour: number;
  private readonly maxResponsesPerSession: number;

  constructor(tier: 'free' | 'premium' | 'education' = 'premium') {
    const limits = {
      free: { sessions: 10, responses: 50 },
      premium: { sessions: 100, responses: 200 },
      education: { sessions: 500, responses: 500 },
    };
    this.maxSessionsPerHour = limits[tier].sessions;
    this.maxResponsesPerSession = limits[tier].responses;
  }

  canStartSession(userId: string): boolean {
    const now = new Date();
    const windowMs = 60 * 60 * 1000; // 1 hour

    const userLimits = this.sessionCounts.get(userId);
    if (!userLimits) return true;

    // Check if window has reset
    if (now.getTime() - userLimits.windowStart.getTime() > windowMs) {
      this.sessionCounts.set(userId, { count: 0, windowStart: now });
      return true;
    }

    return userLimits.count < this.maxSessionsPerHour;
  }

  recordSessionStart(userId: string): void {
    const now = new Date();
    const current = this.sessionCounts.get(userId);

    if (!current || now.getTime() - current.windowStart.getTime() > 60 * 60 * 1000) {
      this.sessionCounts.set(userId, { count: 1, windowStart: now });
    } else {
      current.count++;
    }
  }

  getRemainingSession(userId: string): number {
    const current = this.sessionCounts.get(userId);
    if (!current) return this.maxSessionsPerHour;
    return Math.max(0, this.maxSessionsPerHour - current.count);
  }
}
```

### Step 4: Audio Processing Queue

```typescript
import PQueue from 'p-queue';

// Audio processing has stricter limits
const audioQueue = new PQueue({
  concurrency: 2, // Max 2 concurrent audio requests
  interval: 1000, // Per second
  intervalCap: 3, // Max 3 per second
});

async function queuedAudioProcessing(
  client: SpeakClient,
  audioData: ArrayBuffer,
  options: AudioProcessingOptions
): Promise<PronunciationResult> {
  return audioQueue.add(async () => {
    return client.speech.analyze(audioData, options);
  });
}

// Monitor queue status
audioQueue.on('idle', () => {
  console.log('Audio queue idle');
});

audioQueue.on('add', () => {
  console.log(`Audio queue size: ${audioQueue.size}, pending: ${audioQueue.pending}`);
});
```

### Step 5: Rate Limit Monitor

```typescript
class SpeakRateLimitMonitor {
  private apiRemaining: number = 60;
  private audioRemaining: number = 20;
  private apiResetAt: Date = new Date();
  private audioResetAt: Date = new Date();

  updateFromHeaders(headers: Headers): void {
    // API limits
    if (headers.has('X-RateLimit-Remaining')) {
      this.apiRemaining = parseInt(headers.get('X-RateLimit-Remaining')!);
    }
    if (headers.has('X-RateLimit-Reset')) {
      this.apiResetAt = new Date(parseInt(headers.get('X-RateLimit-Reset')!) * 1000);
    }

    // Audio-specific limits
    if (headers.has('X-Audio-RateLimit-Remaining')) {
      this.audioRemaining = parseInt(headers.get('X-Audio-RateLimit-Remaining')!);
    }
    if (headers.has('X-Audio-RateLimit-Reset')) {
      this.audioResetAt = new Date(parseInt(headers.get('X-Audio-RateLimit-Reset')!) * 1000);
    }
  }

  shouldThrottleApi(): boolean {
    return this.apiRemaining < 5 && new Date() < this.apiResetAt;
  }

  shouldThrottleAudio(): boolean {
    return this.audioRemaining < 3 && new Date() < this.audioResetAt;
  }

  getApiWaitTime(): number {
    return Math.max(0, this.apiResetAt.getTime() - Date.now());
  }

  getAudioWaitTime(): number {
    return Math.max(0, this.audioResetAt.getTime() - Date.now());
  }

  getStatus(): RateLimitStatus {
    return {
      api: {
        remaining: this.apiRemaining,
        resetAt: this.apiResetAt,
        throttled: this.shouldThrottleApi(),
      },
      audio: {
        remaining: this.audioRemaining,
        resetAt: this.audioResetAt,
        throttled: this.shouldThrottleAudio(),
      },
    };
  }
}
```

### Step 6: Idempotency for Lesson Responses

```typescript
import crypto from 'crypto';

// Generate deterministic key for lesson responses
function generateIdempotencyKey(
  sessionId: string,
  responseText: string,
  timestamp: number
): string {
  const data = JSON.stringify({ sessionId, responseText, timestamp });
  return crypto.createHash('sha256').update(data).digest('hex').slice(0, 32);
}

async function idempotentSubmitResponse(
  session: LessonSession,
  response: LessonResponse,
  idempotencyKey?: string
): Promise<TutorFeedback> {
  const key = idempotencyKey || generateIdempotencyKey(
    session.id,
    response.text,
    Math.floor(Date.now() / 10000) // 10-second buckets
  );

  return session.submitResponse(response, {
    headers: { 'Idempotency-Key': key },
  });
}
```

## Output
- Reliable API calls with automatic retry
- Session-aware rate limiting
- Audio processing queue
- Rate limit monitoring
- Idempotent lesson responses

## Error Handling
| Header | Description | Action |
|--------|-------------|--------|
| X-RateLimit-Limit | Max requests | Monitor usage |
| X-RateLimit-Remaining | Remaining requests | Throttle if low |
| X-RateLimit-Reset | Reset timestamp | Wait until reset |
| X-Audio-RateLimit-Remaining | Audio requests left | Queue audio processing |
| Retry-After | Seconds to wait | Honor this value |

## Examples

### Pre-flight Rate Check
```typescript
async function safeSpeakCall<T>(
  monitor: SpeakRateLimitMonitor,
  operation: () => Promise<T>,
  isAudioOperation: boolean = false
): Promise<T> {
  // Check if we should wait
  if (isAudioOperation && monitor.shouldThrottleAudio()) {
    const waitTime = monitor.getAudioWaitTime();
    console.log(`Audio rate limit - waiting ${waitTime}ms`);
    await new Promise(r => setTimeout(r, waitTime));
  } else if (monitor.shouldThrottleApi()) {
    const waitTime = monitor.getApiWaitTime();
    console.log(`API rate limit - waiting ${waitTime}ms`);
    await new Promise(r => setTimeout(r, waitTime));
  }

  return withExponentialBackoff(operation);
}
```

### Batch Lesson Processing
```typescript
async function batchProcessLessons(
  client: SpeakClient,
  lessons: LessonConfig[],
  rateLimiter: SessionRateLimiter
): Promise<LessonResult[]> {
  const results: LessonResult[] = [];

  for (const lesson of lessons) {
    // Check rate limit before starting
    if (!rateLimiter.canStartSession(lesson.userId)) {
      console.log(`Rate limited for user ${lesson.userId}, skipping`);
      results.push({ success: false, error: 'RATE_LIMITED' });
      continue;
    }

    rateLimiter.recordSessionStart(lesson.userId);

    try {
      const result = await withExponentialBackoff(() =>
        client.tutor.runLesson(lesson)
      );
      results.push({ success: true, data: result });
    } catch (error) {
      results.push({ success: false, error });
    }

    // Small delay between lessons
    await new Promise(r => setTimeout(r, 100));
  }

  return results;
}
```

## Resources
- [Speak Rate Limits](https://developer.speak.com/docs/rate-limits)
- [p-queue Documentation](https://github.com/sindresorhus/p-queue)
- [Speak API Headers](https://developer.speak.com/docs/headers)

## Next Steps
For security configuration, see `speak-security-basics`.

Overview

This skill implements Speak rate limiting, exponential backoff with jitter, session-aware limits, audio queuing, and idempotent lesson submissions for language learning integrations. It provides reusable patterns to avoid 429/5xx failures and to maximize throughput within tiered limits. Use it to make Speak API integrations resilient and predictable.

How this skill works

The skill inspects rate-limit headers and maintains a RateLimitMonitor to decide when to throttle API or audio requests. It wraps API calls with an exponential backoff retry loop that honors Retry-After, adds jitter, and stops on non-retryable errors. SessionRateLimiter enforces per-user session caps and an audio processing queue (PQueue) limits concurrent audio work. Idempotency keys ensure duplicate lesson responses don’t create duplicate state.

When to use it

  • Handling Speak 429 or 5xx responses from the Speak API
  • Implementing retries for transient network or server errors
  • Coordinating per-user session limits for lesson flows
  • Throttling and queuing audio analysis or synthesis requests
  • Batch-processing lessons without exceeding tier limits
  • Deploying production Speak integrations that need predictable throughput

Best practices

  • Honor Retry-After header before applying backoff delays
  • Use exponential backoff with randomized jitter to avoid thundering herds
  • Keep session counters windowed (hourly) and persist if needed for multi-instance deployments
  • Limit audio concurrency strictly (e.g., 2 concurrent, small per-second cap) to protect audio quotas
  • Generate short deterministic idempotency keys for lesson responses to dedupe retries
  • Monitor rate headers and expose wait-time metrics for operational dashboards

Example use cases

  • SafeSpeakCall wrapper: pre-flight check with RateLimitMonitor then backoff-wrapped operation
  • SessionRateLimiter for classroom mode to prevent a single student from exhausting hourly sessions
  • Audio queue for pronunciation analysis that enforces concurrency and interval caps
  • Batch lesson runner that skips or delays lessons when per-user limits are reached
  • Idempotent lesson submit function using a SHA-256-derived key to avoid duplicate feedback entries

FAQ

What headers should I trust to manage limits?

Primarily X-RateLimit-Remaining and X-RateLimit-Reset for API calls, X-Audio-RateLimit-Remaining and X-Audio-RateLimit-Reset for audio, and Retry-After when present.

When should I stop retrying?

Stop after a configured maxRetries or immediately on non-retryable status codes (e.g., 400–499 other than 429). Honor Retry-After when provided and cap max delay to avoid infinite loops.

How do idempotency keys work for lesson responses?

Create short deterministic keys from session id, response text, and a coarse timestamp bucket so repeated submissions within the bucket reuse the same key and the server deduplicates processing.