home / skills / letta-ai / letta / llm-provider-usage-statistics

llm-provider-usage-statistics skill

safe

This skill helps you debug token counts and optimize prefix caching across OpenAI, Anthropic, and Gemini by applying provider-specific usage rules.

npx playbooks add skill letta-ai/letta --skill llm-provider-usage-statistics

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

1.9 KB

---
name: llm-provider-usage-statistics
description: Reference guide for token counting and prefix caching across LLM providers (OpenAI, Anthropic, Gemini). Use when debugging token counts or optimizing prefix caching.
---

# LLM Provider Usage Statistics

Reference documentation for how different LLM providers report token usage.

## Quick Reference: Token Counting Semantics

| Provider | `input_tokens` meaning | Cache tokens | Must add cache to get total? |
|----------|------------------------|--------------|------------------------------|
| OpenAI | TOTAL (includes cached) | `cached_tokens` is subset | No |
| Anthropic | NON-cached only | `cache_read_input_tokens` + `cache_creation_input_tokens` | **Yes** |
| Gemini | TOTAL (includes cached) | `cached_content_token_count` is subset | No |

**Critical difference:** Anthropic's `input_tokens` excludes cached tokens, so you must add them:
```
total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens
```

## Quick Reference: Prefix Caching

| Provider | Min tokens | How to enable | TTL |
|----------|-----------|---------------|-----|
| OpenAI | 1,024 | Automatic | ~5-10 min |
| Anthropic | 1,024 | Requires `cache_control` breakpoints | 5 min |
| Gemini 2.0+ | 1,024 | Automatic (implicit) | Variable |

## Quick Reference: Reasoning/Thinking Tokens

| Provider | Field name | Models |
|----------|-----------|--------|
| OpenAI | `reasoning_tokens` | o1, o3 models |
| Anthropic | N/A | (thinking is in content blocks, not usage) |
| Gemini | `thoughts_token_count` | Gemini 2.0 with thinking enabled |

## Provider Reference Files

- **OpenAI:** [references/openai.md](references/openai.md) - Chat Completions vs Responses API, reasoning models, cached_tokens
- **Anthropic:** [references/anthropic.md](references/anthropic.md) - cache_control setup, beta headers, cache token fields
- **Gemini:** [references/gemini.md](references/gemini.md) - implicit caching, thinking tokens, usage_metadata fields

Overview

This skill is a compact reference guide for how major LLM providers report token usage and how prefix caching affects counts. It focuses on OpenAI, Anthropic, and Gemini differences so you can debug token accounting and optimize caching. Use it when you need accurate total token metrics or to tune prefix caching behavior across providers.

How this skill works

The guide summarizes each provider's token-count semantics, which usage fields include cached tokens, and which cache fields you must add to compute totals. It also documents minimum token sizes, how prefix caching is enabled, TTLs, and fields for reasoning or thinking tokens. The content is organized for quick lookup and applied when reconciling billing, debugging unexpected token totals, or implementing cross-provider telemetry.

When to use it

Debugging mismatched token totals between your app and provider usage reports.
Calculating exact total_input when using Anthropic (which excludes cached tokens).
Designing or tuning prefix caching to reduce input token consumption.
Comparing reasoning/thinking token fields across providers for observability.
Implementing cross-provider billing or cost forecasting for agents.

Best practices

Always check provider-specific fields: OpenAI and Gemini include cached tokens in input totals, Anthropic does not.
For Anthropic, compute total_input = input_tokens + cache_read_input_tokens + cache_creation_input_tokens.
Track reasoning/thinking fields separately (OpenAI: reasoning_tokens, Gemini: thoughts_token_count) for model diagnostics.
Enable and validate prefix caching behavior in staging to observe TTL and minimum token effects before production.
Standardize your telemetry schema to include raw provider fields and computed total_input to avoid ambiguity.

Example use cases

A developer sees lower-than-expected Anthropic input_tokens and adds cache_read/cache_creation tokens to reconcile costs.
An ops engineer tunes prefix cache TTL after noticing OpenAI cached_tokens reduced repeated prompt costs.
A monitoring dashboard shows reasoning token usage per request by reading reasoning_tokens (OpenAI) or thoughts_token_count (Gemini).
A cost forecaster normalizes token metrics across providers before projecting monthly spend for stateful agents.

FAQ

Why do Anthropic usage numbers look lower than others?

Anthropic's input_tokens excludes cached tokens; you must add cache_read_input_tokens and cache_creation_input_tokens to get the true total.

Do I always need to add cached tokens to totals?

No. OpenAI and Gemini report input totals that already include cached tokens. Only Anthropic requires adding cache fields.