home / skills / athola / claude-night-market / quota-management

quota-management skill

safe

/plugins/leyline/skills/quota-management

This skill helps manage quotas for rate-limited APIs by tracking usage, thresholds, and graceful degradation to optimize reliability and cost.

npx playbooks add skill athola/claude-night-market --skill quota-management

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

3.7 KB

---
name: quota-management
description: 'Quota tracking, threshold monitoring, and graceful degradation for rate-limited
  API services.


  quota, rate limiting, usage limits, thresholds.'
category: infrastructure
tags:
- quota
- rate-limiting
- resource-management
- cost-tracking
- thresholds
dependencies: []
tools:
- quota-tracker
provides:
  infrastructure:
  - quota-tracking
  - threshold-monitoring
  - usage-estimation
  patterns:
  - graceful-degradation
  - quota-enforcement
  - cost-optimization
usage_patterns:
- service-integration
- rate-limit-management
- cost-tracking
- resource-monitoring
complexity: intermediate
estimated_tokens: 500
progressive_loading: true
modules:
- modules/threshold-strategies.md
- modules/estimation-patterns.md
---
## Table of Contents

- [Overview](#overview)
- [When to Use](#when-to-use)
- [Core Concepts](#core-concepts)
- [Quota Thresholds](#quota-thresholds)
- [Quota Types](#quota-types)
- [Quick Start](#quick-start)
- [Check Quota Status](#check-quota-status)
- [Record Usage](#record-usage)
- [Estimate Before Execution](#estimate-before-execution)
- [Integration Pattern](#integration-pattern)
- [Detailed Resources](#detailed-resources)
- [Exit Criteria](#exit-criteria)


# Quota Management

## Overview

Patterns for tracking and enforcing resource quotas across rate-limited services. This skill provides the infrastructure that other plugins use for consistent quota handling.

## When To Use

- Building integrations with rate-limited APIs
- Need to track usage across sessions
- Want graceful degradation when limits approached
- Require cost estimation before operations

## When NOT To Use

- Project doesn't use the leyline infrastructure patterns
- Simple scripts without service architecture needs

## Core Concepts

### Quota Thresholds

Three-tier threshold system for proactive management:

| Level | Usage | Action |
|-------|-------|--------|
| **Healthy** | <80% | Proceed normally |
| **Warning** | 80-95% | Alert, consider batching |
| **Critical** | >95% | Defer non-urgent, use secondary services |

### Quota Types

```python
@dataclass
class QuotaConfig:
    requests_per_minute: int = 60
    requests_per_day: int = 1000
    tokens_per_minute: int = 100000
    tokens_per_day: int = 1000000
```

## Quick Start

### Check Quota Status
```python
from leyline.quota_tracker import QuotaTracker

tracker = QuotaTracker(service="my-service")
status, warnings = tracker.get_quota_status()

if status == "CRITICAL":
    # Defer or use secondary service
    pass
```

### Record Usage
```python
tracker.record_request(
    tokens=estimated_tokens,
    success=True,
    duration=elapsed_seconds
)
```

### Estimate Before Execution
```python
can_proceed, issues = tracker.can_handle_task(estimated_tokens)
if not can_proceed:
    print(f"Quota issues: {issues}")
```

## Integration Pattern

Other plugins reference this skill:

```yaml
# In your skill's frontmatter
dependencies: [leyline:quota-management]
```

Then use the shared patterns:
1. Initialize tracker for your service
2. Check quota before operations
3. Record usage after operations
4. Handle threshold warnings gracefully

## Detailed Resources

- **Threshold Strategies**: See `modules/threshold-strategies.md` for degradation patterns
- **Estimation Patterns**: See `modules/estimation-patterns.md` for token/cost estimation

## Exit Criteria

- Quota status checked before operation
- Usage recorded after operation
- Threshold warnings handled appropriately
## Troubleshooting

### Common Issues

**Command not found**
Ensure all dependencies are installed and in PATH

**Permission errors**
Check file permissions and run with appropriate privileges

**Unexpected behavior**
Enable verbose logging with `--verbose` flag

Overview

This skill implements quota tracking, threshold monitoring, and graceful degradation patterns for rate-limited API services. It provides a reusable tracker that estimates cost, checks current usage against multi-tier thresholds, and records consumption so integrations can make safe decisions. The design focuses on proactive alerts and fallbacks to reduce failed requests and unexpected costs.

How this skill works

A QuotaTracker instance maintains usage counters and evaluates them against a three-tier threshold model (Healthy, Warning, Critical). Before executing work you call an estimate check to see if the task fits within current quotas; after execution you record tokens, requests, and duration. When thresholds approach Warning or Critical levels, the skill surfaces recommendations like batching, deferring non-urgent work, or routing to secondary services.

When to use it

Integrating with external APIs that enforce request or token limits
Tracking usage across user sessions or distributed workers
Needing cost or token estimates before executing expensive operations
Implementing graceful degradation and fallback strategies for reliability
Coordinating quota-aware behavior across multiple plugins or services

Best practices

Always run can_handle_task (estimate) before initiating a call to avoid wasted work
Record successful and failed requests to keep counters accurate for future decisions
Map service-specific limits into the QuotaConfig (requests/tokens per minute/day)
React to Warning thresholds with batching, throttling, or user-facing alerts
Automate fallback to secondary providers or deferred queues at Critical level

Example use cases

An assistant that batches high-token operations when Warning thresholds are reached
A background worker that defers non-urgent tasks to off-peak hours to avoid Critical
A multi-tenant gateway that enforces per-tenant quotas and shares global budgets
A cost-control tool that estimates API token consumption before committing expensive runs
A failover pattern that routes requests to a secondary API when primary quota is exhausted

FAQ

What thresholds does the skill use and why?

It uses Healthy (<80%), Warning (80–95%), and Critical (>95%). This gives time to alert and adapt before hard limits cause failures.

How do I integrate it into my service?

Initialize a QuotaTracker for your service, call can_handle_task before operations, record usage after, and implement fallback behavior when Warning or Critical is reported.