home / skills / jjuidev / jss / context-engineering

context-engineering skill

safe

/.claude/skills/context-engineering

This skill helps optimize token usage and context quality for multi-agent systems, reducing costs while preserving reasoning through just-in-time loading and

npx playbooks add skill jjuidev/jss --skill context-engineering

Review the files below or copy the command above to add this skill to your agents.

Files (14)

SKILL.md

4.4 KB

---
name: context-engineering
description: >-
  Check context usage limits, monitor time remaining, optimize token consumption, debug context failures.
  Use when asking about context percentage, rate limits, usage warnings, context optimization, agent architectures, memory systems.
version: 1.0.0
---

# Context Engineering

Context engineering curates the smallest high-signal token set for LLM tasks. The goal: maximize reasoning quality while minimizing token usage.

## When to Activate

- Designing/debugging agent systems
- Context limits constrain performance
- Optimizing cost/latency
- Building multi-agent coordination
- Implementing memory systems
- Evaluating agent performance
- Developing LLM-powered pipelines

## Core Principles

1. **Context quality > quantity** - High-signal tokens beat exhaustive content
2. **Attention is finite** - U-shaped curve favors beginning/end positions
3. **Progressive disclosure** - Load information just-in-time
4. **Isolation prevents degradation** - Partition work across sub-agents
5. **Measure before optimizing** - Know your baseline

**IMPORTANT:**
- Sacrifice grammar for the sake of concision.
- Ensure token efficiency while maintaining high quality.
- Pass these rules to subagents.

## Quick Reference

| Topic | When to Use | Reference |
|-------|-------------|-----------|
| **Fundamentals** | Understanding context anatomy, attention mechanics | [context-fundamentals.md](./references/context-fundamentals.md) |
| **Degradation** | Debugging failures, lost-in-middle, poisoning | [context-degradation.md](./references/context-degradation.md) |
| **Optimization** | Compaction, masking, caching, partitioning | [context-optimization.md](./references/context-optimization.md) |
| **Compression** | Long sessions, summarization strategies | [context-compression.md](./references/context-compression.md) |
| **Memory** | Cross-session persistence, knowledge graphs | [memory-systems.md](./references/memory-systems.md) |
| **Multi-Agent** | Coordination patterns, context isolation | [multi-agent-patterns.md](./references/multi-agent-patterns.md) |
| **Evaluation** | Testing agents, LLM-as-Judge, metrics | [evaluation.md](./references/evaluation.md) |
| **Tool Design** | Tool consolidation, description engineering | [tool-design.md](./references/tool-design.md) |
| **Pipelines** | Project development, batch processing | [project-development.md](./references/project-development.md) |
| **Runtime Awareness** | Usage limits, context window monitoring | [runtime-awareness.md](./references/runtime-awareness.md) |

## Key Metrics

- **Token utilization**: Warning at 70%, trigger optimization at 80%
- **Token variance**: Explains 80% of agent performance variance
- **Multi-agent cost**: ~15x single agent baseline
- **Compaction target**: 50-70% reduction, <5% quality loss
- **Cache hit target**: 70%+ for stable workloads

## Four-Bucket Strategy

1. **Write**: Save context externally (scratchpads, files)
2. **Select**: Pull only relevant context (retrieval, filtering)
3. **Compress**: Reduce tokens while preserving info (summarization)
4. **Isolate**: Split across sub-agents (partitioning)

## Anti-Patterns

- Exhaustive context over curated context
- Critical info in middle positions
- No compaction triggers before limits
- Single agent for parallelizable tasks
- Tools without clear descriptions

## Guidelines

1. Place critical info at beginning/end of context
2. Implement compaction at 70-80% utilization
3. Use sub-agents for context isolation, not role-play
4. Design tools with 4-question framework (what, when, inputs, returns)
5. Optimize for tokens-per-task, not tokens-per-request
6. Validate with probe-based evaluation
7. Monitor KV-cache hit rates in production
8. Start minimal, add complexity only when proven necessary

## Runtime Awareness

The system automatically injects usage awareness via PostToolUse hook:

```xml
<usage-awareness>
Claude Usage Limits: 5h=45%, 7d=32%
Context Window Usage: 67%
</usage-awareness>
```

**Thresholds:**
- 70%: WARNING - consider optimization/compaction
- 90%: CRITICAL - immediate action needed

**Data Sources:**
- Usage limits: Anthropic OAuth API (`https://api.anthropic.com/api/oauth/usage`)
- Context window: Statusline temp file (`/tmp/ck-context-{session_id}.json`)

## Scripts

- [context_analyzer.py](./scripts/context_analyzer.py) - Context health analysis, degradation detection
- [compression_evaluator.py](./scripts/compression_evaluator.py) - Compression quality evaluation

Overview

This skill helps engineers monitor and optimize LLM context usage to preserve reasoning quality while reducing token costs. It detects approaching limits, surfaces warnings and remaining time, and recommends compaction or partitioning strategies. Use it to debug context failures, tune memory systems, and design multi-agent workflows.

How this skill works

The skill inspects runtime telemetry (context window utilization, KV-cache hit rates, and API usage quotas) and runs health checks for degradation patterns like lost-in-middle. It computes thresholds (warning at ~70%, critical at ~90%), measures token variance, and suggests actions from a four-bucket strategy: Write, Select, Compress, Isolate. It can also evaluate compression quality and simulate cost impact for multi-agent setups.

When to use it

Designing or debugging agent architectures and memory systems
When token limits or rate limits constrain performance or cause failures
To monitor context window usage and receive automated warnings
During optimization sprints to reduce cost or latency
When implementing multi-agent coordination or partitioning strategies

Best practices

Prioritize high-signal tokens; put critical info at beginning or end of context
Trigger compaction at 70–80% utilization, immediate action at 90%
Adopt progressive disclosure: load info just-in-time rather than all upfront
Partition work across sub-agents to avoid single-context degradation
Measure baseline metrics (token utilization, variance, cache hits) before optimizing

Example use cases

Alert when a session hits 70% token utilization and auto-suggest summaries to compact context
Analyze failed requests to find lost-in-middle or poisoning patterns and recommend fixes
Estimate multi-agent cost overhead and propose isolation or consolidation changes
Evaluate summarization strategies with compression_evaluator.py to meet compaction targets
Instrument runtime to report context usage and API quota consumption for proactive throttling

FAQ

What exact thresholds trigger warnings?

Use 70% for warning and 90% for critical; treat 80% as a compaction trigger point for planned actions.

How much compaction is realistic without hurting quality?

Aim for 50–70% token reduction while keeping quality loss under ~5%; validate with probe-based evaluation and A/B tests.