home / skills / omer-metin / skills-for-antigravity / context-window-management

context-window-management skill

/skills/context-window-management

This skill helps you optimize LLM context window usage by summarizing, routing, and trimming content to prevent token overflow and maintain meaning.

npx playbooks add skill omer-metin/skills-for-antigravity --skill context-window-management

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.1 KB
---
name: context-window-management
description: Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rotUse when "context window, token limit, context management, context engineering, long context, context overflow, llm, context, tokens, memory, summarization, optimization" mentioned. 
---

# Context Window Management

## Identity

You're a context engineering specialist who has optimized LLM applications handling
millions of conversations. You've seen systems hit token limits, suffer context rot,
and lose critical information mid-dialogue.

You understand that context is a finite resource with diminishing returns. More tokens
doesn't mean better results—the art is in curating the right information. You know
the serial position effect, the lost-in-the-middle problem, and when to summarize
versus when to retrieve.

Your core principles:
1. Context is finite—even with 2M tokens, treat it as precious
2. Recency and primacy matter—put important info at start and end
3. Summarize don't truncate—preserve meaning when reducing
4. Route intelligently—use the right model for the context size
5. Monitor token usage—because costs scale with context
6. Test with real conversations—synthetic tests miss edge cases


## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill provides practical strategies for managing LLM context windows to prevent token overload, context rot, and lost information. It focuses on summarization, targeted trimming, routing between models, and preservation of critical signals like recency and primacy. The goal is to maximize useful context while controlling cost and latency.

How this skill works

The skill inspects conversation history, metadata, and system state to decide what to keep, compress, or discard. It applies selective summarization for middle-history, trims low-value tokens, and routes large-context needs to models designed for long windows. It also monitors token usage and flags risky patterns like repeated context drift.

When to use it

  • When conversations approach token limits or show degraded relevance
  • When building multi-turn agents that must retain facts across long dialogs
  • When cost or latency grows due to excessive context sizes
  • When you need to combine short-term chat state with long-term memory or knowledge
  • When migrating to models with different context capacities

Best practices

  • Treat context as a scarce resource: prioritize and rank information by actionability
  • Preserve primacy and recency: ensure the most important info appears at the start or end of context
  • Summarize middling history rather than naive truncation to retain intent and facts
  • Route large or archival context to specialized long-window models and keep local queries on fast short-window models
  • Continuously monitor token usage and set automated thresholds for summarization or rollover
  • Validate strategies with real conversations and edge-case tests, not only synthetic prompts

Example use cases

  • Customer support agent that summarizes a user’s multi-day interaction to keep key facts while freeing tokens
  • A code-assistant that routes large repository context to a long-window model and keeps the IDE chat on a smaller, cheaper model
  • A multi-channel virtual assistant that trims redundant system messages and compresses them into a single digest
  • A compliance workflow that maintains essential transaction facts at the context edges while archiving older details
  • A QA pipeline that flags context rot by measuring drift between user goals and retrieved context

FAQ

Should I always summarize instead of truncating?

Prefer summarization for preserving meaning; truncate only when content is low-value or clearly redundant.

How do I choose when to route to a long-window model?

Route when required context exceeds your short-model’s capacity and the extra tokens materially affect outputs; use cost/latency thresholds to decide.