home / skills / sstobo / convex-skills / convex-agents-rate-limiting

convex-agents-rate-limiting skill

safe

This skill enforces per-user and global rate limits to control message frequency and token usage, preserving budget and fair access.

npx playbooks add skill sstobo/convex-skills --skill convex-agents-rate-limiting

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.6 KB

---
name: "Convex Agents Rate Limiting"
description: "Controls message frequency and token usage to prevent abuse and manage API budgets. Use this to implement per-user limits, global caps, burst capacity, and token quota management."
---

## Purpose

Rate limiting protects against abuse, manages LLM costs, and ensures fair resource allocation. Covers message frequency limits and token usage quotas.

## When to Use This Skill

- Preventing rapid-fire message spam
- Limiting total tokens per user
- Implementing burst capacity
- Global API limits to stay under provider quotas
- Fair resource allocation in multi-user systems
- Billing based on token usage

## Configure Rate Limiter

```typescript
import { RateLimiter, MINUTE, SECOND } from "@convex-dev/rate-limiter";

export const rateLimiter = new RateLimiter(components.rateLimiter, {
  sendMessage: {
    kind: "fixed window",
    period: 5 * SECOND,
    rate: 1,
    capacity: 2,
  },
  globalSendMessage: {
    kind: "token bucket",
    period: MINUTE,
    rate: 1_000,
  },
  tokenUsagePerUser: {
    kind: "token bucket",
    period: MINUTE,
    rate: 2000,
    capacity: 10000,
  },
  globalTokenUsage: {
    kind: "token bucket",
    period: MINUTE,
    rate: 100_000,
  },
});
```

## Check Message Rate Limit

```typescript
export const sendMessage = mutation({
  args: { threadId: v.string(), message: v.string(), userId: v.string() },
  handler: async (ctx, { threadId, message, userId }) => {
    try {
      await rateLimiter.limit(ctx, "sendMessage", {
        key: userId,
        throws: true,
      });

      await rateLimiter.limit(ctx, "globalSendMessage", { throws: true });

      const { messageId } = await saveMessage(ctx, components.agent, {
        threadId,
        prompt: message,
      });

      return { success: true, messageId };
    } catch (error) {
      if (isRateLimitError(error)) {
        return {
          success: false,
          error: "Rate limit exceeded",
          retryAfter: error.data.retryAfter,
        };
      }
      throw error;
    }
  },
});
```

## Check Token Usage

```typescript
export const checkTokenUsage = action({
  args: { threadId: v.string(), question: v.string(), userId: v.string() },
  handler: async (ctx, { threadId, question, userId }) => {
    const estimatedTokens = await estimateTokens(ctx, threadId, question);

    try {
      await rateLimiter.check(ctx, "tokenUsagePerUser", {
        key: userId,
        count: estimatedTokens,
        throws: true,
      });

      // Proceed with generation
      const { thread } = await myAgent.continueThread(ctx, { threadId });
      const result = await thread.generateText({ prompt: question });

      return { success: true, response: result.text };
    } catch (error) {
      if (isRateLimitError(error)) {
        return {
          success: false,
          error: "Token limit exceeded",
          retryAfter: error.data.retryAfter,
        };
      }
      throw error;
    }
  },
});

async function estimateTokens(
  ctx: QueryCtx,
  threadId: string,
  question: string
): Promise<number> {
  const questionTokens = Math.ceil(question.length / 4);
  const responseTokens = Math.ceil(questionTokens * 3);
  return questionTokens + responseTokens;
}
```

## Track Actual Usage

```typescript
const myAgent = new Agent(components.agent, {
  name: "My Agent",
  languageModel: openai.chat("gpt-4o-mini"),
  usageHandler: async (ctx, { usage, userId }) => {
    if (!userId) return;

    await rateLimiter.limit(ctx, "tokenUsagePerUser", {
      key: userId,
      count: usage.totalTokens,
      reserve: true,
    });
  },
});
```

## Client-Side Rate Limit Checking

```typescript
import { useRateLimit } from "@convex-dev/rate-limiter/react";
import { isRateLimitError } from "@convex-dev/rate-limiter";

function ChatInput() {
  const { status } = useRateLimit(api.rateLimiting.getRateLimit);

  if (status && !status.ok) {
    return (
      <div className="text-red-500">
        Rate limit exceeded. Retry after{" "}
        {new Date(status.retryAt).toLocaleTimeString()}
      </div>
    );
  }

  return <input type="text" placeholder="Send a message..." />;
}
```

## Key Principles

- **Fixed window for frequency**: Use for simple X per period
- **Token bucket for capacity**: Use for burst-friendly limits
- **Estimate before, track after**: Prevent early, record actual usage
- **Global + per-user limits**: Balance fair access with resource caps
- **Retryable errors**: Clients can retry with backoff

## Next Steps

- See **usage-tracking** for billing based on token usage
- See **fundamentals** for agent setup
- See **debugging** for troubleshooting

Overview

This skill enforces message frequency and token-usage limits to prevent abuse, control LLM costs, and ensure fair resource allocation. It provides configurable fixed-window and token-bucket policies for per-user and global limits, plus hooks to estimate and record actual token usage. Use it to combine burst capacity, steady-rate caps, and quota accounting in multi-user systems.

How this skill works

Define named limiters with strategies (fixed window for simple counts, token bucket for burst-friendly rates) and attach them to actions. Check limits before performing work by supplying a key (user ID or global key) and optional token counts; reserve or debit actual usage after generation. Client helpers expose status so UIs can show retry times and avoid unnecessary requests.

When to use it

Prevent rapid-fire message spam from a single user or session
Enforce total token quotas per user to control billing exposure
Provide burst capacity while keeping long-term throughput steady
Protect against hitting global provider API quotas
Allocate resources fairly across many users in shared systems

Best practices

Use fixed-window rules for simple per-period message caps and token buckets for burst-tolerant quotas
Estimate token usage before generation to reject heavy requests early, then record actual usage afterward
Combine per-user and global limits to avoid a single user consuming shared budget
Return retryAfter timestamps for clients so UIs can show when to retry
Reserve capacity (not just check) when starting a generation to avoid races

Example use cases

Limit each user to 1 message every 5 seconds with a small burst capacity
Enforce a per-minute token quota per user while maintaining a large global token pool
Reject requests that would exceed a billing quota and inform the client when they can try again
Expose rate-limit status to the chat input so the UI disables sending when limits are hit
Track actual model usage from an agent’s usage handler to debit tokens after generation

FAQ

How should I estimate token usage before generation?

Use a simple heuristic (for example, characters/4) to estimate request and response tokens. Reject or check against quota using that estimate, then adjust by recording actual tokens after generation.

What happens when a limit is exceeded?

The limiter throws a rate-limit error that includes retry timing. Return a structured error to the client with retryAfter so clients can back off and retry later.