home / skills / jeremylongshore / claude-code-plugins-plus-skills / rate-limiting-apis

This skill helps you implement robust rate limiting for APIs using sliding windows, token buckets, and quotas to protect resources.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill rate-limiting-apis

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.4 KB
---
name: rate-limiting-apis
description: |
  Implement sophisticated rate limiting with sliding windows, token buckets, and quotas.
  Use when protecting APIs from excessive requests.
  Trigger with phrases like "add rate limiting", "limit API requests", or "implement rate limits".
  
allowed-tools: Read, Write, Edit, Grep, Glob, Bash(api:ratelimit-*)
version: 1.0.0
author: Jeremy Longshore <[email protected]>
license: MIT
---

# Rate Limiting Apis

## Overview


This skill provides automated assistance for api rate limiter tasks.
This skill provides automated assistance for the described functionality.

## Prerequisites

Before using this skill, ensure you have:
- API design specifications or requirements documented
- Development environment with necessary frameworks installed
- Database or backend services accessible for integration
- Authentication and authorization strategies defined
- Testing tools and environments configured

## Instructions

1. Use Read tool to examine existing API specifications from {baseDir}/api-specs/
2. Define resource models, endpoints, and HTTP methods
3. Document request/response schemas and data types
4. Identify authentication and authorization requirements
5. Plan error handling and validation strategies
1. Generate boilerplate code using Bash(api:ratelimit-*) with framework scaffolding
2. Implement endpoint handlers with business logic
3. Add input validation and schema enforcement
4. Integrate authentication and authorization middleware
5. Configure database connections and ORM models
1. Write integration tests covering all endpoints


See `{baseDir}/references/implementation.md` for detailed implementation guide.

## Output

- `{baseDir}/src/routes/` - Endpoint route definitions
- `{baseDir}/src/controllers/` - Business logic handlers
- `{baseDir}/src/models/` - Data models and schemas
- `{baseDir}/src/middleware/` - Authentication, validation, logging
- `{baseDir}/src/config/` - Configuration and environment variables
- OpenAPI 3.0 specification with complete endpoint definitions

## Error Handling

See `{baseDir}/references/errors.md` for comprehensive error handling.

## Examples

See `{baseDir}/references/examples.md` for detailed examples.

## Resources

- Express.js and Fastify for Node.js APIs
- Flask and FastAPI for Python APIs
- Spring Boot for Java APIs
- Gin and Echo for Go APIs
- OpenAPI Specification 3.0+ for API documentation

Overview

This skill helps implement sophisticated API rate limiting strategies including sliding windows, token buckets, and quota enforcement. It provides practical guidance, code scaffolding patterns, and integration steps so you can protect endpoints from excessive requests without degrading legitimate traffic. The goal is predictable throttling, fair usage enforcement, and easy observability of limits.

How this skill works

The skill inspects API design and traffic requirements, then recommends and scaffolds appropriate rate limiting primitives (sliding window counters, token buckets, fixed windows) and quota models. It shows where to apply middleware, how to persist counters (in-memory, Redis, or a database), and how to surface limits in responses and metrics. It also outlines testing and error handling patterns to validate behavior under load.

When to use it

  • Protect public or internal APIs from abuse or accidental high-traffic spikes
  • Enforce per-user, per-API-key, or per-IP request quotas and burst limits
  • Implement fair share policies for multi-tenant services or paid tiers
  • Reduce backend overload during traffic surges or graceful degradation
  • Comply with third-party usage agreements or client SLA requirements

Best practices

  • Choose the simplest algo that meets requirements: token bucket for bursts, sliding windows for smoother enforcement
  • Store counters in a fast shared store (Redis) for distributed services and atomic operations
  • Expose remaining quota and reset information in response headers for better client experience
  • Combine short-term limits and long-term quotas to balance bursts and sustained usage
  • Write integration and load tests that simulate realistic client behavior and clock drift

Example use cases

  • Rate-limit login and authentication endpoints to prevent credential-stuffing attacks
  • Apply per-API-key token buckets with higher burst allowance for premium customers
  • Throttle heavy report-generation endpoints using sliding windows to smooth traffic
  • Enforce daily or monthly data-export quotas for tenants with overage billing
  • Protect third-party webhook endpoints by limiting retries and backoff behavior

FAQ

Which store should I use for counters in a multi-instance API?

Use a fast shared store like Redis with atomic ops (INCR, LUA scripts) to ensure consistent counters across instances.

How do I choose between token bucket and sliding window?

Use token bucket when you need burst capability and refill behavior; choose sliding window when you want smoother enforcement and more even distribution.

How should limits be communicated to clients?

Return standard headers (e.g., X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) and clear 429 responses with retry-after info.