home / skills / sstobo / convex-skills / convex-agents-rate-limiting

convex-agents-rate-limiting skill

/convex-agents-rate-limiting

This skill enforces per-user and global rate limits to control message frequency and token usage, preserving budget and fair access.

npx playbooks add skill sstobo/convex-skills --skill convex-agents-rate-limiting

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.6 KB
---
name: "Convex Agents Rate Limiting"
description: "Controls message frequency and token usage to prevent abuse and manage API budgets. Use this to implement per-user limits, global caps, burst capacity, and token quota management."
---

## Purpose

Rate limiting protects against abuse, manages LLM costs, and ensures fair resource allocation. Covers message frequency limits and token usage quotas.

## When to Use This Skill

- Preventing rapid-fire message spam
- Limiting total tokens per user
- Implementing burst capacity
- Global API limits to stay under provider quotas
- Fair resource allocation in multi-user systems
- Billing based on token usage

## Configure Rate Limiter

```typescript
import { RateLimiter, MINUTE, SECOND } from "@convex-dev/rate-limiter";

export const rateLimiter = new RateLimiter(components.rateLimiter, {
  sendMessage: {
    kind: "fixed window",
    period: 5 * SECOND,
    rate: 1,
    capacity: 2,
  },
  globalSendMessage: {
    kind: "token bucket",
    period: MINUTE,
    rate: 1_000,
  },
  tokenUsagePerUser: {
    kind: "token bucket",
    period: MINUTE,
    rate: 2000,
    capacity: 10000,
  },
  globalTokenUsage: {
    kind: "token bucket",
    period: MINUTE,
    rate: 100_000,
  },
});
```

## Check Message Rate Limit

```typescript
export const sendMessage = mutation({
  args: { threadId: v.string(), message: v.string(), userId: v.string() },
  handler: async (ctx, { threadId, message, userId }) => {
    try {
      await rateLimiter.limit(ctx, "sendMessage", {
        key: userId,
        throws: true,
      });

      await rateLimiter.limit(ctx, "globalSendMessage", { throws: true });

      const { messageId } = await saveMessage(ctx, components.agent, {
        threadId,
        prompt: message,
      });

      return { success: true, messageId };
    } catch (error) {
      if (isRateLimitError(error)) {
        return {
          success: false,
          error: "Rate limit exceeded",
          retryAfter: error.data.retryAfter,
        };
      }
      throw error;
    }
  },
});
```

## Check Token Usage

```typescript
export const checkTokenUsage = action({
  args: { threadId: v.string(), question: v.string(), userId: v.string() },
  handler: async (ctx, { threadId, question, userId }) => {
    const estimatedTokens = await estimateTokens(ctx, threadId, question);

    try {
      await rateLimiter.check(ctx, "tokenUsagePerUser", {
        key: userId,
        count: estimatedTokens,
        throws: true,
      });

      // Proceed with generation
      const { thread } = await myAgent.continueThread(ctx, { threadId });
      const result = await thread.generateText({ prompt: question });

      return { success: true, response: result.text };
    } catch (error) {
      if (isRateLimitError(error)) {
        return {
          success: false,
          error: "Token limit exceeded",
          retryAfter: error.data.retryAfter,
        };
      }
      throw error;
    }
  },
});

async function estimateTokens(
  ctx: QueryCtx,
  threadId: string,
  question: string
): Promise<number> {
  const questionTokens = Math.ceil(question.length / 4);
  const responseTokens = Math.ceil(questionTokens * 3);
  return questionTokens + responseTokens;
}
```

## Track Actual Usage

```typescript
const myAgent = new Agent(components.agent, {
  name: "My Agent",
  languageModel: openai.chat("gpt-4o-mini"),
  usageHandler: async (ctx, { usage, userId }) => {
    if (!userId) return;

    await rateLimiter.limit(ctx, "tokenUsagePerUser", {
      key: userId,
      count: usage.totalTokens,
      reserve: true,
    });
  },
});
```

## Client-Side Rate Limit Checking

```typescript
import { useRateLimit } from "@convex-dev/rate-limiter/react";
import { isRateLimitError } from "@convex-dev/rate-limiter";

function ChatInput() {
  const { status } = useRateLimit(api.rateLimiting.getRateLimit);

  if (status && !status.ok) {
    return (
      <div className="text-red-500">
        Rate limit exceeded. Retry after{" "}
        {new Date(status.retryAt).toLocaleTimeString()}
      </div>
    );
  }

  return <input type="text" placeholder="Send a message..." />;
}
```

## Key Principles

- **Fixed window for frequency**: Use for simple X per period
- **Token bucket for capacity**: Use for burst-friendly limits
- **Estimate before, track after**: Prevent early, record actual usage
- **Global + per-user limits**: Balance fair access with resource caps
- **Retryable errors**: Clients can retry with backoff

## Next Steps

- See **usage-tracking** for billing based on token usage
- See **fundamentals** for agent setup
- See **debugging** for troubleshooting

Overview

This skill enforces message frequency and token-usage limits to prevent abuse, control LLM costs, and ensure fair resource allocation. It provides configurable fixed-window and token-bucket policies for per-user and global limits, plus hooks to estimate and record actual token usage. Use it to combine burst capacity, steady-rate caps, and quota accounting in multi-user systems.

How this skill works

Define named limiters with strategies (fixed window for simple counts, token bucket for burst-friendly rates) and attach them to actions. Check limits before performing work by supplying a key (user ID or global key) and optional token counts; reserve or debit actual usage after generation. Client helpers expose status so UIs can show retry times and avoid unnecessary requests.

When to use it

  • Prevent rapid-fire message spam from a single user or session
  • Enforce total token quotas per user to control billing exposure
  • Provide burst capacity while keeping long-term throughput steady
  • Protect against hitting global provider API quotas
  • Allocate resources fairly across many users in shared systems

Best practices

  • Use fixed-window rules for simple per-period message caps and token buckets for burst-tolerant quotas
  • Estimate token usage before generation to reject heavy requests early, then record actual usage afterward
  • Combine per-user and global limits to avoid a single user consuming shared budget
  • Return retryAfter timestamps for clients so UIs can show when to retry
  • Reserve capacity (not just check) when starting a generation to avoid races

Example use cases

  • Limit each user to 1 message every 5 seconds with a small burst capacity
  • Enforce a per-minute token quota per user while maintaining a large global token pool
  • Reject requests that would exceed a billing quota and inform the client when they can try again
  • Expose rate-limit status to the chat input so the UI disables sending when limits are hit
  • Track actual model usage from an agent’s usage handler to debit tokens after generation

FAQ

How should I estimate token usage before generation?

Use a simple heuristic (for example, characters/4) to estimate request and response tokens. Reject or check against quota using that estimate, then adjust by recording actual tokens after generation.

What happens when a limit is exceeded?

The limiter throws a rate-limit error that includes retry timing. Return a structured error to the client with retryAfter so clients can back off and retry later.