home / skills / jeremylongshore / claude-code-plugins-plus-skills / openrouter-rate-limits

openrouter-rate-limits skill

safe

/plugins/saas-packs/openrouter-pack/skills/openrouter-rate-limits

This skill helps you manage OpenRouter rate limits with exponential backoff, token bucket, and queuing strategies to sustain high throughput.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill openrouter-rate-limits

Review the files below or copy the command above to add this skill to your agents.

Files (7)

SKILL.md

1.6 KB

---
name: openrouter-rate-limits
description: |
  Handle OpenRouter rate limits with proper backoff strategies. Use when experiencing 429 errors or building high-throughput systems. Trigger with phrases like 'openrouter rate limit', 'openrouter 429', 'openrouter throttle', 'openrouter backoff'.
allowed-tools: Read, Write, Edit, Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Openrouter Rate Limits

## Overview

This skill teaches rate limit handling patterns including exponential backoff, token bucket algorithms, and request queuing.

## Prerequisites

- OpenRouter integration
- Understanding of HTTP status codes

## Instructions

Follow these steps to implement this skill:

1. **Verify Prerequisites**: Ensure all prerequisites listed above are met
2. **Review the Implementation**: Study the code examples and patterns below
3. **Adapt to Your Environment**: Modify configuration values for your setup
4. **Test the Integration**: Run the verification steps to confirm functionality
5. **Monitor in Production**: Set up appropriate logging and monitoring

## Output

Successful execution produces:
- Working OpenRouter integration
- Verified API connectivity
- Example responses demonstrating functionality

## Error Handling

See `{baseDir}/references/errors.md` for comprehensive error handling.

## Examples

See `{baseDir}/references/examples.md` for detailed examples.

## Resources

- [OpenRouter Documentation](https://openrouter.ai/docs)
- [OpenRouter Models](https://openrouter.ai/models)
- [OpenRouter API Reference](https://openrouter.ai/docs/api-reference)
- [OpenRouter Status](https://status.openrouter.ai)

Overview

This skill teaches practical strategies to handle OpenRouter rate limits and prevent 429 errors. It focuses on implementing exponential backoff, token-bucket throttling, and request queuing for high-throughput systems. The guidance is language-agnostic but includes Python-ready patterns and testing tips.

How this skill works

The skill inspects request responses from OpenRouter for rate-limit signals (HTTP 429 and Retry-After headers) and applies progressively stronger mitigation: immediate retries with jitter, exponential backoff, and client-side throttling using a token-bucket algorithm. It also outlines request queuing and prioritization so critical calls succeed while bulk work is paced. Configuration points (max retries, base delay, bucket capacity, and refill rate) are highlighted for adaptation to your environment.

When to use it

You receive 429 or throttling errors from OpenRouter.
Building services that make many concurrent API calls to OpenRouter.
Running batch jobs or pipelines that spike request volume.
Implementing client libraries or SDKs that call OpenRouter on behalf of users.
Hardening production systems to avoid cascading failures during outages.

Best practices

Read Retry-After and respect server-provided backoff hints when present.
Use exponential backoff with randomized jitter to avoid synchronized retries.
Apply a token-bucket or leaky-bucket limiter to client-side callers to smooth bursts.
Limit max retries and surface clear errors to callers when limits are exceeded.
Instrument and log rate-limit events, retry counts, and latency for monitoring and alerts.
Test backoff and throttling under load and with simulated 429 responses before production rollout.

Example use cases

API client that automatically pauses and retries when OpenRouter responds with 429.
High-throughput ingestion pipeline that uses a token-bucket to keep sustained request rate below threshold.
Web app that prioritizes interactive user requests while queuing background model calls.
CI batch job that staggers model queries with configurable backoff and retry limits.
Developer SDK that exposes configurable retry/backoff settings to integrators.

FAQ

How many retries should I attempt after a 429?

Keep retries low—commonly 3–5 attempts—using exponential backoff and jitter. Honor any Retry-After header and fail fast to calling code once retries are exhausted.

When should I use client-side throttling vs. just retrying?

Use throttling for predictable traffic smoothing and to prevent bursts; retries handle transient spikes. Combine both: throttle steady-state rate and retry transient 429s with backoff.