home / skills / hoangnguyen0403 / agent-skills-standard / context-optimization

context-optimization skill

/skills/common/context-optimization

This skill helps optimize AI sessions by masking noisy outputs, compacting state, and preserving context to reduce latency and lost context.

npx playbooks add skill hoangnguyen0403/agent-skills-standard --skill context-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
1.8 KB
---
name: Context Optimization
description: Techniques to maximize context window efficiency, reduce latency, and prevent 'lost in middle' issues through strategic masking and compaction.
metadata:
  labels: [context, optimization, tokens, memory, performance]
  triggers:
    files: ['*.log', 'chat-history.json']
    keywords: [reduce tokens, optimize context, summarize history, clear output]
---

## **Priority: P1 (OPTIMIZATION)**

Manage the Attention Budget. Treat context as a scarce resource.

## 1. Observation Masking (Noise Reduction)

**Problem**: Large tool outputs (logs, JSON lists) flood context and degrade reasoning.
**Solution**: Replace raw output with semantic summaries _after_ consumption.

1.  **Identify**: outputs > 50 lines or > 1kb.
2.  **Extract**: Read critical data points immediately.
3.  **Mask**: Rewrite history to replace raw data with `[Reference: <summary_of_findings>]`.
4.  **See**: `references/masking.md` for patterns.

## 2. Context Compaction (State Preservation)

**Problem**: Long conversations drift from original intent.
**Solution**: Recursive summarization that preserves _State_ over _Dialogue_.

1.  **Trigger**: Every 10 turns or 8k tokens.
2.  **Compact**:
    - **Keep**: User Goal, Active Task, Current Errors, Key Decisions.
    - **Drop**: Chat chit-chat, intermediate tool calls, corrected assumptions.
3.  **Format**: Update `System Prompt` or `Memory File` with compacted state.
4.  **See**: `references/compaction.md` for algorithms.

## 3. KV-Cache Awareness (Latency)

**Goal**: Maximize pre-fill cache hits.

- **Static Prefix**: strict ordering: System -> Tools -> RAG -> User.
- **Append-Only**: Avoid inserting into the middle of history if possible.

## References

- [Observation Masking Patterns](references/masking.md)
- [Compaction Algorithms](references/compaction.md)

Overview

This skill teaches practical techniques to maximize context window efficiency, reduce latency, and prevent 'lost in middle' issues by using strategic masking and compaction. It focuses on treating context as a scarce resource and preserving only the state necessary for accurate, low-latency reasoning. The methods are language- and framework-agnostic with examples for TypeScript-based agents.

How this skill works

The skill inspects conversation history, tool outputs, and memory to identify noisy or redundant content, then replaces or summarizes it to keep the actionable state compact. It applies observation masking for large outputs, recursive context compaction every few turns or token thresholds, and enforces KV-cache friendly ordering to reduce retrieval latency. The result is a small, high-signal context that preserves goals, active tasks, errors, and decisions while discarding transient chatter.

When to use it

  • When tool outputs (logs, JSON, diffs) exceed ~50 lines or 1 KB
  • In long-running sessions where conversations drift from the original goal
  • When you observe increased latency due to cache misses or long history
  • Before handing off state to other services or saving to persistent memory
  • When token budget constraints force prioritization of information

Best practices

  • Immediately extract critical data points from large outputs, then replace raw content with a short reference summary
  • Run recursive summarization on state every 10 turns or ~8k tokens, preserving goal, active task, errors, and key decisions
  • Store compacted state in a dedicated system prompt or memory file rather than interleaving it with dialogue
  • Keep a strict, append-only ordering for static prefixes (System -> Tools -> RAG -> User) to maximize KV-cache hits
  • Avoid inserting historic entries into the middle of history; prefer rewriting or summarizing instead

Example use cases

  • An agent processes a 10k-line log: extract error lines, summarize findings, and mask the raw log with a reference note
  • A multi-turn bug triage session: compact the conversation every 10 turns to preserve the active debugging state and drop chit-chat
  • A code-generation workflow: summarize large dependency manifests into a short dependency snapshot and keep the prompt lean
  • A customer support bot: maintain customer goal and pending actions while removing verbose transcript sections
  • A hybrid RAG pipeline: reorder static context blocks to the front and avoid mid-history inserts to reduce model latency

FAQ

How often should I trigger compaction?

Trigger compaction every 10 turns or when you reach roughly 8k tokens, whichever comes first.

What exactly should I keep vs drop during compaction?

Keep user goal, active task, current errors, and key decisions. Drop chit-chat, transient tool calls, and intermediate corrected assumptions.