home / skills / yonatangross / orchestkit / cache-cost-tracking
/plugins/ork/skills/cache-cost-tracking
This skill helps you monitor LLM costs and cache effectiveness using Langfuse, enabling cost attribution and cache optimization across agents.
npx playbooks add skill yonatangross/orchestkit --skill cache-cost-trackingReview the files below or copy the command above to add this skill to your agents.
---
name: cache-cost-tracking
description: LLM cost tracking with Langfuse for cached responses. Use when monitoring cache effectiveness, tracking cost savings, or attributing costs to agents in multi-agent systems.
tags: [llm, cost, caching, langfuse, observability]
context: fork
agent: metrics-architect
version: 1.0.0
author: OrchestKit
user-invocable: false
---
# Cache Cost Tracking
Monitor LLM costs and cache effectiveness.
## Langfuse Automatic Tracking
```python
from langfuse.decorators import observe, langfuse_context
@observe(as_type="generation")
async def call_llm_with_cache(
prompt: str,
agent_type: str,
analysis_id: UUID
) -> str:
"""LLM call with automatic cost tracking."""
# Link to parent trace
langfuse_context.update_current_trace(
name=f"{agent_type}_generation",
session_id=str(analysis_id)
)
# Check caches
if cache_key in lru_cache:
langfuse_context.update_current_observation(
metadata={"cache_layer": "L1", "cache_hit": True}
)
return lru_cache[cache_key]
similar = await semantic_cache.get(prompt, agent_type)
if similar:
langfuse_context.update_current_observation(
metadata={"cache_layer": "L2", "cache_hit": True}
)
return similar
# LLM call - Langfuse tracks tokens/cost automatically
response = await llm.generate(prompt)
langfuse_context.update_current_observation(
metadata={
"cache_layer": "L4",
"cache_hit": False,
"prompt_cache_hit": response.usage.cache_read_input_tokens > 0
}
)
return response.content
```
## Hierarchical Cost Rollup
```python
class AnalysisWorkflow:
@observe(as_type="trace")
async def run_analysis(self, url: str, analysis_id: UUID):
"""Parent trace aggregates child costs.
Trace Hierarchy:
run_analysis (trace)
├── security_agent (generation)
├── tech_agent (generation)
└── synthesis (generation)
"""
langfuse_context.update_current_trace(
name="content_analysis",
session_id=str(analysis_id),
tags=["multi-agent"]
)
for agent in self.agents:
await self.run_agent(agent, content, analysis_id)
@observe(as_type="generation")
async def run_agent(self, agent, content, analysis_id):
"""Child generation - costs roll up to parent."""
langfuse_context.update_current_observation(
name=f"{agent.name}_generation",
metadata={"agent_type": agent.name}
)
return await agent.analyze(content)
```
## Cost Queries
```python
from langfuse import Langfuse
async def get_analysis_costs(analysis_id: UUID) -> dict:
langfuse = Langfuse()
traces = langfuse.get_traces(session_id=str(analysis_id), limit=1)
if traces.data:
trace = traces.data[0]
return {
"total_cost": trace.total_cost,
"input_tokens": trace.usage.input_tokens,
"output_tokens": trace.usage.output_tokens,
"cache_read_tokens": trace.usage.cache_read_input_tokens,
}
async def get_costs_by_agent() -> list[dict]:
generations = langfuse.get_generations(
from_timestamp=datetime.now() - timedelta(days=7),
limit=1000
)
costs = {}
for gen in generations.data:
agent = gen.metadata.get("agent_type", "unknown")
if agent not in costs:
costs[agent] = {"total": 0, "calls": 0, "cache_hits": 0}
costs[agent]["total"] += gen.calculated_total_cost or 0
costs[agent]["calls"] += 1
if gen.metadata.get("cache_hit"):
costs[agent]["cache_hits"] += 1
return list(costs.values())
```
## Cache Effectiveness
```python
cache_hits = 0
cache_misses = 0
cost_saved = 0.0
for gen in generations:
if gen.metadata.get("cache_hit"):
cache_hits += 1
cost_saved += estimate_full_cost(gen)
else:
cache_misses += 1
hit_rate = cache_hits / (cache_hits + cache_misses)
print(f"Cache Hit Rate: {hit_rate:.1%}")
print(f"Cost Saved: ${cost_saved:.2f}")
```
## Key Decisions
| Decision | Recommendation |
|----------|----------------|
| Trace grouping | session_id = analysis_id |
| Cost attribution | metadata.agent_type |
| Query window | 7-30 days |
| Dashboard | Langfuse web UI |
## Common Mistakes
- Not linking child to parent trace
- Missing metadata for attribution
- Not tracking cache hits separately
- Ignoring prompt cache savings
## Related Skills
- `semantic-caching` - Redis caching
- `prompt-caching` - Provider caching
- `langfuse-observability` - Full observability
## Capability Details
### prompt-caching
**Keywords:** prompt cache, cache prompt, prefix caching, cache breakpoints
**Solves:**
- Reduce token costs with cached prompts
- Configure cache breakpoints
- Implement provider-native caching
### response-caching
**Keywords:** response cache, semantic cache, cache response, LLM cache
**Solves:**
- Cache LLM responses for repeated queries
- Implement semantic similarity caching
- Reduce API calls with cached responses
### cost-calculation
**Keywords:** cost, token cost, calculate cost, pricing, usage cost
**Solves:**
- Calculate token costs by model
- Track input/output token pricing
- Estimate cost before execution
### usage-tracking
**Keywords:** usage, track usage, token usage, API usage, metrics
**Solves:**
- Track LLM API usage over time
- Monitor token consumption
- Generate usage reports
### cache-invalidation
**Keywords:** invalidate, cache invalidation, TTL, expire, refresh
**Solves:**
- Implement cache invalidation strategies
- Configure TTL for cached responses
- Handle stale cache entries
This skill provides LLM cost tracking integrated with Langfuse for cached responses and multi-agent workflows. It helps monitor cache effectiveness, attribute costs by agent, and surface cost savings from prompt and response caches. Use it to roll up costs into parent traces, compute hit rates, and query cost breakdowns programmatically.
The skill instruments LLM calls with Langfuse traces and observations, marking cache layer, cache_hit, and prompt cache reads on each generation. Child generations report metadata.agent_type so costs roll up into a parent analysis trace identified by session_id. It also includes helper queries to fetch trace usage, per-agent cost aggregation, and simple cache-effectiveness calculations.
How do I attribute costs to a specific agent?
Add metadata.agent_type on each generation and roll up costs via traces filtered by session_id; aggregate calculated_total_cost per agent.
How do I measure cost savings from cache hits?
Count generations with metadata.cache_hit, estimate the full-call cost for those entries, and sum to compute cost_saved; report hit_rate = hits / (hits + misses).