home / skills / athola / claude-night-market / quota-management

quota-management skill

/plugins/leyline/skills/quota-management

This skill helps manage quotas for rate-limited APIs by tracking usage, thresholds, and graceful degradation to optimize reliability and cost.

npx playbooks add skill athola/claude-night-market --skill quota-management

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
3.7 KB
---
name: quota-management
description: 'Quota tracking, threshold monitoring, and graceful degradation for rate-limited
  API services.


  quota, rate limiting, usage limits, thresholds.'
category: infrastructure
tags:
- quota
- rate-limiting
- resource-management
- cost-tracking
- thresholds
dependencies: []
tools:
- quota-tracker
provides:
  infrastructure:
  - quota-tracking
  - threshold-monitoring
  - usage-estimation
  patterns:
  - graceful-degradation
  - quota-enforcement
  - cost-optimization
usage_patterns:
- service-integration
- rate-limit-management
- cost-tracking
- resource-monitoring
complexity: intermediate
estimated_tokens: 500
progressive_loading: true
modules:
- modules/threshold-strategies.md
- modules/estimation-patterns.md
---
## Table of Contents

- [Overview](#overview)
- [When to Use](#when-to-use)
- [Core Concepts](#core-concepts)
- [Quota Thresholds](#quota-thresholds)
- [Quota Types](#quota-types)
- [Quick Start](#quick-start)
- [Check Quota Status](#check-quota-status)
- [Record Usage](#record-usage)
- [Estimate Before Execution](#estimate-before-execution)
- [Integration Pattern](#integration-pattern)
- [Detailed Resources](#detailed-resources)
- [Exit Criteria](#exit-criteria)


# Quota Management

## Overview

Patterns for tracking and enforcing resource quotas across rate-limited services. This skill provides the infrastructure that other plugins use for consistent quota handling.

## When To Use

- Building integrations with rate-limited APIs
- Need to track usage across sessions
- Want graceful degradation when limits approached
- Require cost estimation before operations

## When NOT To Use

- Project doesn't use the leyline infrastructure patterns
- Simple scripts without service architecture needs

## Core Concepts

### Quota Thresholds

Three-tier threshold system for proactive management:

| Level | Usage | Action |
|-------|-------|--------|
| **Healthy** | <80% | Proceed normally |
| **Warning** | 80-95% | Alert, consider batching |
| **Critical** | >95% | Defer non-urgent, use secondary services |

### Quota Types

```python
@dataclass
class QuotaConfig:
    requests_per_minute: int = 60
    requests_per_day: int = 1000
    tokens_per_minute: int = 100000
    tokens_per_day: int = 1000000
```

## Quick Start

### Check Quota Status
```python
from leyline.quota_tracker import QuotaTracker

tracker = QuotaTracker(service="my-service")
status, warnings = tracker.get_quota_status()

if status == "CRITICAL":
    # Defer or use secondary service
    pass
```

### Record Usage
```python
tracker.record_request(
    tokens=estimated_tokens,
    success=True,
    duration=elapsed_seconds
)
```

### Estimate Before Execution
```python
can_proceed, issues = tracker.can_handle_task(estimated_tokens)
if not can_proceed:
    print(f"Quota issues: {issues}")
```

## Integration Pattern

Other plugins reference this skill:

```yaml
# In your skill's frontmatter
dependencies: [leyline:quota-management]
```

Then use the shared patterns:
1. Initialize tracker for your service
2. Check quota before operations
3. Record usage after operations
4. Handle threshold warnings gracefully

## Detailed Resources

- **Threshold Strategies**: See `modules/threshold-strategies.md` for degradation patterns
- **Estimation Patterns**: See `modules/estimation-patterns.md` for token/cost estimation

## Exit Criteria

- Quota status checked before operation
- Usage recorded after operation
- Threshold warnings handled appropriately
## Troubleshooting

### Common Issues

**Command not found**
Ensure all dependencies are installed and in PATH

**Permission errors**
Check file permissions and run with appropriate privileges

**Unexpected behavior**
Enable verbose logging with `--verbose` flag

Overview

This skill implements quota tracking, threshold monitoring, and graceful degradation patterns for rate-limited API services. It provides a reusable tracker that estimates cost, checks current usage against multi-tier thresholds, and records consumption so integrations can make safe decisions. The design focuses on proactive alerts and fallbacks to reduce failed requests and unexpected costs.

How this skill works

A QuotaTracker instance maintains usage counters and evaluates them against a three-tier threshold model (Healthy, Warning, Critical). Before executing work you call an estimate check to see if the task fits within current quotas; after execution you record tokens, requests, and duration. When thresholds approach Warning or Critical levels, the skill surfaces recommendations like batching, deferring non-urgent work, or routing to secondary services.

When to use it

  • Integrating with external APIs that enforce request or token limits
  • Tracking usage across user sessions or distributed workers
  • Needing cost or token estimates before executing expensive operations
  • Implementing graceful degradation and fallback strategies for reliability
  • Coordinating quota-aware behavior across multiple plugins or services

Best practices

  • Always run can_handle_task (estimate) before initiating a call to avoid wasted work
  • Record successful and failed requests to keep counters accurate for future decisions
  • Map service-specific limits into the QuotaConfig (requests/tokens per minute/day)
  • React to Warning thresholds with batching, throttling, or user-facing alerts
  • Automate fallback to secondary providers or deferred queues at Critical level

Example use cases

  • An assistant that batches high-token operations when Warning thresholds are reached
  • A background worker that defers non-urgent tasks to off-peak hours to avoid Critical
  • A multi-tenant gateway that enforces per-tenant quotas and shares global budgets
  • A cost-control tool that estimates API token consumption before committing expensive runs
  • A failover pattern that routes requests to a secondary API when primary quota is exhausted

FAQ

What thresholds does the skill use and why?

It uses Healthy (<80%), Warning (80–95%), and Critical (>95%). This gives time to alert and adapt before hard limits cause failures.

How do I integrate it into my service?

Initialize a QuotaTracker for your service, call can_handle_task before operations, record usage after, and implement fallback behavior when Warning or Critical is reported.