home / skills / jeremylongshore / claude-code-plugins-plus-skills / twinmind-rate-limits

This skill implements TwinMind rate limits with exponential backoff, queueing, adaptive throttling, and batch handling to optimize API throughput.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill twinmind-rate-limits

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
10.2 KB
---
name: twinmind-rate-limits
description: |
  Implement TwinMind rate limiting, backoff, and optimization patterns.
  Use when handling rate limit errors, implementing retry logic,
  or optimizing API request throughput for TwinMind.
  Trigger with phrases like "twinmind rate limit", "twinmind throttling",
  "twinmind 429", "twinmind retry", "twinmind backoff".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# TwinMind Rate Limits

## Overview
Handle TwinMind rate limits gracefully with exponential backoff and request optimization.

## Prerequisites
- TwinMind API access (Pro/Enterprise)
- Understanding of async/await patterns
- Familiarity with rate limiting concepts

## Instructions

### Step 1: Understand Rate Limit Tiers

| Tier | Audio Hours/Month | API Requests/Min | Concurrent Transcriptions | Burst |
|------|-------------------|------------------|--------------------------|-------|
| Free | Unlimited | 30 | 1 | 5 |
| Pro ($10/mo) | Unlimited | 60 | 3 | 15 |
| Enterprise | Unlimited | 300 | 10 | 50 |

**Key Limits:**
- Transcription: Based on audio duration ($0.23/hour with Ear-3)
- AI Operations: Token-based (2M context for Pro)
- Summarization: 10/minute (Free), 30/minute (Pro)
- Memory Search: 60/minute (Free), 300/minute (Pro)

### Step 2: Implement Exponential Backoff with Jitter

```typescript
// src/twinmind/rate-limit.ts
interface RateLimitConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterMs: number;
}

const defaultConfig: RateLimitConfig = {
  maxRetries: 5,
  baseDelayMs: 1000,
  maxDelayMs: 60000, // Max 1 minute
  jitterMs: 500,
};

export async function withRateLimit<T>(
  operation: () => Promise<T>,
  config: Partial<RateLimitConfig> = {}
): Promise<T> {
  const { maxRetries, baseDelayMs, maxDelayMs, jitterMs } = {
    ...defaultConfig,
    ...config,
  };

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error: any) {
      if (attempt === maxRetries) throw error;

      const status = error.response?.status;
      if (status !== 429 && status !== 503) throw error; // Only retry on rate limits

      // Check Retry-After header
      const retryAfter = error.response?.headers?.['retry-after'];
      let delay: number;

      if (retryAfter) {
        delay = parseInt(retryAfter) * 1000;
      } else {
        // Exponential backoff with jitter
        const exponential = baseDelayMs * Math.pow(2, attempt);
        const jitter = Math.random() * jitterMs;
        delay = Math.min(exponential + jitter, maxDelayMs);
      }

      console.log(`Rate limited (attempt ${attempt + 1}). Waiting ${delay}ms...`);
      await new Promise(r => setTimeout(r, delay));
    }
  }

  throw new Error('Max retries exceeded');
}
```

### Step 3: Implement Request Queue

```typescript
// src/twinmind/queue.ts
import PQueue from 'p-queue';

interface QueueConfig {
  concurrency: number;
  intervalMs: number;
  intervalCap: number;
}

const tierConfigs: Record<string, QueueConfig> = {
  free: { concurrency: 1, intervalMs: 60000, intervalCap: 30 },
  pro: { concurrency: 3, intervalMs: 60000, intervalCap: 60 },
  enterprise: { concurrency: 10, intervalMs: 60000, intervalCap: 300 },
};

export class TwinMindQueue {
  private queue: PQueue;
  private tier: string;

  constructor(tier: 'free' | 'pro' | 'enterprise' = 'pro') {
    const config = tierConfigs[tier];
    this.tier = tier;
    this.queue = new PQueue({
      concurrency: config.concurrency,
      interval: config.intervalMs,
      intervalCap: config.intervalCap,
    });
  }

  async add<T>(operation: () => Promise<T>, priority?: number): Promise<T> {
    return this.queue.add(operation, { priority }) as Promise<T>;
  }

  get pending(): number {
    return this.queue.pending;
  }

  get size(): number {
    return this.queue.size;
  }

  pause(): void {
    this.queue.pause();
  }

  resume(): void {
    this.queue.start();
  }

  clear(): void {
    this.queue.clear();
  }
}

// Singleton instance
let queueInstance: TwinMindQueue | null = null;

export function getQueue(tier?: 'free' | 'pro' | 'enterprise'): TwinMindQueue {
  if (!queueInstance) {
    queueInstance = new TwinMindQueue(tier);
  }
  return queueInstance;
}
```

### Step 4: Monitor Rate Limit Headers

```typescript
// src/twinmind/rate-monitor.ts
export interface RateLimitStatus {
  limit: number;
  remaining: number;
  reset: Date;
  percentUsed: number;
}

export class RateLimitMonitor {
  private limits = new Map<string, RateLimitStatus>();

  updateFromResponse(endpoint: string, headers: Headers): void {
    const limit = parseInt(headers.get('X-RateLimit-Limit') || '60');
    const remaining = parseInt(headers.get('X-RateLimit-Remaining') || '60');
    const resetTimestamp = headers.get('X-RateLimit-Reset');
    const reset = resetTimestamp
      ? new Date(parseInt(resetTimestamp) * 1000)
      : new Date(Date.now() + 60000);

    this.limits.set(endpoint, {
      limit,
      remaining,
      reset,
      percentUsed: ((limit - remaining) / limit) * 100,
    });
  }

  getStatus(endpoint: string): RateLimitStatus | undefined {
    return this.limits.get(endpoint);
  }

  shouldThrottle(endpoint: string, threshold = 10): boolean {
    const status = this.limits.get(endpoint);
    if (!status) return false;

    // Throttle if remaining < threshold AND reset hasn't happened
    return status.remaining < threshold && new Date() < status.reset;
  }

  getWaitTime(endpoint: string): number {
    const status = this.limits.get(endpoint);
    if (!status) return 0;

    const now = Date.now();
    const resetTime = status.reset.getTime();

    return Math.max(0, resetTime - now);
  }

  getAllStatuses(): Map<string, RateLimitStatus> {
    return new Map(this.limits);
  }
}

export const rateLimitMonitor = new RateLimitMonitor();
```

### Step 5: Implement Adaptive Rate Limiting

```typescript
// src/twinmind/adaptive-limiter.ts
export class AdaptiveRateLimiter {
  private successCount = 0;
  private failureCount = 0;
  private currentDelay = 0;
  private minDelay = 0;
  private maxDelay = 5000;
  private windowMs = 60000;
  private windowStart = Date.now();

  recordSuccess(): void {
    this.maybeResetWindow();
    this.successCount++;

    // Decrease delay on success (min 0)
    if (this.currentDelay > 0) {
      this.currentDelay = Math.max(0, this.currentDelay - 100);
    }
  }

  recordFailure(isRateLimit: boolean): void {
    this.maybeResetWindow();
    this.failureCount++;

    if (isRateLimit) {
      // Increase delay on rate limit
      this.currentDelay = Math.min(this.maxDelay, this.currentDelay + 500);
    }
  }

  private maybeResetWindow(): void {
    const now = Date.now();
    if (now - this.windowStart > this.windowMs) {
      this.successCount = 0;
      this.failureCount = 0;
      this.windowStart = now;
    }
  }

  getDelay(): number {
    return this.currentDelay;
  }

  getMetrics(): { success: number; failure: number; delay: number; ratio: number } {
    const total = this.successCount + this.failureCount;
    return {
      success: this.successCount,
      failure: this.failureCount,
      delay: this.currentDelay,
      ratio: total > 0 ? this.successCount / total : 1,
    };
  }

  async wait(): Promise<void> {
    if (this.currentDelay > 0) {
      await new Promise(r => setTimeout(r, this.currentDelay));
    }
  }
}
```

### Step 6: Batch Requests for Efficiency

```typescript
// src/twinmind/batch.ts
export interface BatchOptions {
  maxBatchSize: number;
  maxWaitMs: number;
}

export class TranscriptionBatcher {
  private pending: Array<{
    audioUrl: string;
    resolve: (value: any) => void;
    reject: (error: any) => void;
  }> = [];
  private timer: NodeJS.Timeout | null = null;
  private options: BatchOptions;

  constructor(options: Partial<BatchOptions> = {}) {
    this.options = {
      maxBatchSize: 5,
      maxWaitMs: 1000,
      ...options,
    };
  }

  async transcribe(audioUrl: string): Promise<any> {
    return new Promise((resolve, reject) => {
      this.pending.push({ audioUrl, resolve, reject });

      if (this.pending.length >= this.options.maxBatchSize) {
        this.flush();
      } else if (!this.timer) {
        this.timer = setTimeout(() => this.flush(), this.options.maxWaitMs);
      }
    });
  }

  private async flush(): Promise<void> {
    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }

    const batch = this.pending.splice(0, this.options.maxBatchSize);
    if (batch.length === 0) return;

    try {
      // Use batch API if available
      const results = await this.processBatch(batch.map(b => b.audioUrl));

      batch.forEach((item, index) => {
        item.resolve(results[index]);
      });
    } catch (error) {
      batch.forEach(item => item.reject(error));
    }
  }

  private async processBatch(audioUrls: string[]): Promise<any[]> {
    const client = getTwinMindClient();
    const response = await client.post('/transcribe/batch', {
      audio_urls: audioUrls,
      model: 'ear-3',
    });
    return response.data.transcripts;
  }
}
```

## Output
- Reliable API calls with automatic retry
- Request queue with rate limit awareness
- Adaptive throttling based on response patterns
- Batch processing for efficiency
- Real-time rate limit monitoring

## Error Handling

| Header | Description | Action |
|--------|-------------|--------|
| X-RateLimit-Limit | Max requests per window | Monitor total quota |
| X-RateLimit-Remaining | Remaining in window | Throttle when low |
| X-RateLimit-Reset | Unix timestamp of reset | Wait until reset |
| Retry-After | Seconds to wait | Honor this value |

## Rate Limit Best Practices

1. **Always handle 429 responses** - Never let rate limits crash your app
2. **Use request queues** - Don't burst requests
3. **Monitor remaining quota** - Throttle before hitting limits
4. **Implement circuit breakers** - Fail fast when API is overloaded
5. **Cache responses** - Avoid redundant requests
6. **Batch when possible** - Reduce total request count

## Resources
- [TwinMind Rate Limits](https://twinmind.com/docs/rate-limits)
- [p-queue Documentation](https://github.com/sindresorhus/p-queue)
- [Rate Limiting Patterns](https://cloud.google.com/architecture/rate-limiting-strategies-techniques)

## Next Steps
For security configuration, see `twinmind-security-basics`.

Overview

This skill implements TwinMind rate limiting, retry/backoff, and request optimization patterns for reliable API usage. It combines exponential backoff with jitter, a priority request queue, adaptive throttling, batch submission, and real-time rate-limit monitoring. The goal is predictable throughput, fewer 429 errors, and graceful recovery under load.

How this skill works

The skill wraps API operations with a retry loop that retries only on 429/503 responses, honoring Retry-After when present and otherwise using exponential backoff plus jitter. A configurable request queue enforces concurrency and per-minute caps per TwinMind tier, while a monitor parses rate-limit headers to track remaining quota and reset times. An adaptive limiter adjusts internal delays based on recent successes and failures, and a batcher groups compatible requests to reduce total calls.

When to use it

  • When you receive TwinMind 429 or 503 errors and need automated retries.
  • When you must control concurrency and API calls per minute to match a TwinMind tier.
  • When building high-throughput transcription or summarization pipelines.
  • When you want to batch similar requests (e.g., transcriptions) to save quota.
  • When you need runtime visibility into remaining quota and reset windows.

Best practices

  • Always handle 429 and 503 responses and honor Retry-After headers.
  • Use a queue tuned to your account tier (free/pro/enterprise) to avoid bursts.
  • Apply exponential backoff with jitter to prevent synchronized retries.
  • Monitor X-RateLimit-* headers and throttle preemptively when remaining is low.
  • Batch compatible operations where supported to reduce request count.
  • Record metrics and use a circuit breaker to fail fast when errors spike.

Example use cases

  • A transcription service that batches audio URLs and submits them via a single batch endpoint.
  • An async worker pool that limits concurrent TwinMind calls to the Pro tier caps.
  • A CLI tool that retries transient 429 errors with exponential backoff and jitter.
  • A monitored pipeline that pauses or slows requests when percent-used crosses a threshold.
  • A service that adapts delay per client based on recent success/failure ratios.

FAQ

Which headers should I track to manage limits?

Track X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After to compute remaining quota, wait times, and honor server hints.

When should I prefer batching over increasing concurrency?

Batching is preferable when the API offers a batch endpoint because it reduces total requests and quota usage; increase concurrency only when the per-minute cap allows it and latency is critical.