home / skills / yonatangross / orchestkit / context-compression

context-compression skill

safe

/plugins/ork/skills/context-compression

This skill compresses long conversation history using anchored summarization to preserve task-critical information while staying within token limits.

npx playbooks add skill yonatangross/orchestkit --skill context-compression

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

7.5 KB

---
name: context-compression
description: Use when conversation context is too long, hitting token limits, or responses are degrading. Compresses history while preserving critical information using anchored summarization and probe-based validation.
context: fork
version: 1.0.0
author: OrchestKit AI Agent Hub
tags: [context, compression, summarization, memory, optimization, 2026]
user-invocable: false
---

# Context Compression

**Reduce context size while preserving information critical to task completion.**

## Overview

Context compression is essential for long-running agent sessions. The goal is NOT maximum compression—it's preserving enough information to complete tasks without re-fetching.

**Key Metric:** Tokens-per-task (total tokens to complete a task), NOT tokens-per-request.

## Overview

- Long-running conversations approaching context limits
- Multi-step agent workflows with accumulating history
- Sessions with large tool outputs
- Memory management in persistent agents

---

## Strategy Quick Reference

| Strategy | Compression | Interpretable | Verifiable | Best For |
|----------|-------------|---------------|------------|----------|
| Anchored Iterative | 60-80% | Yes | Yes | Long sessions |
| Opaque | 95-99% | No | No | Storage-critical |
| Regenerative Full | 70-85% | Yes | Partial | Simple tasks |
| Sliding Window | 50-70% | Yes | Yes | Real-time chat |

**Recommended:** Anchored Iterative Summarization with probe-based evaluation.

---

## Anchored Summarization (RECOMMENDED)

Maintains structured, persistent summaries with forced sections:

```
## Session Intent
[What we're trying to accomplish - NEVER lose this]

## Files Modified
- path/to/file.ts: Added function X, modified class Y

## Decisions Made
- Decision 1: Chose X over Y because [rationale]

## Current State
[Where we are in the task - progress indicator]

## Blockers / Open Questions
- Question 1: Awaiting user input on...

## Next Steps
1. Complete X
2. Test Y
```

**Why it works:**
- Structure FORCES preservation of critical categories
- Each section must be explicitly populated (can't silently drop info)
- Incremental merge (new compressions extend, don't replace)

---

## Implementation

```python
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class AnchoredSummary:
    """Structured summary with forced sections."""

    session_intent: str
    files_modified: dict[str, list[str]] = field(default_factory=dict)
    decisions_made: list[dict] = field(default_factory=list)
    current_state: str = ""
    blockers: list[str] = field(default_factory=list)
    next_steps: list[str] = field(default_factory=list)
    compression_count: int = 0

    def merge(self, new_content: "AnchoredSummary") -> "AnchoredSummary":
        """Incrementally merge new summary into existing."""
        return AnchoredSummary(
            session_intent=new_content.session_intent or self.session_intent,
            files_modified={**self.files_modified, **new_content.files_modified},
            decisions_made=self.decisions_made + new_content.decisions_made,
            current_state=new_content.current_state,
            blockers=new_content.blockers,
            next_steps=new_content.next_steps,
            compression_count=self.compression_count + 1,
        )

    def to_markdown(self) -> str:
        """Render as markdown for context injection."""
        sections = [
            f"## Session Intent\n{self.session_intent}",
            f"## Files Modified\n" + "\n".join(
                f"- `{path}`: {', '.join(changes)}"
                for path, changes in self.files_modified.items()
            ),
            f"## Decisions Made\n" + "\n".join(
                f"- **{d['decision']}**: {d['rationale']}"
                for d in self.decisions_made
            ),
            f"## Current State\n{self.current_state}",
        ]
        if self.blockers:
            sections.append(f"## Blockers\n" + "\n".join(f"- {b}" for b in self.blockers))
        sections.append(f"## Next Steps\n" + "\n".join(
            f"{i+1}. {step}" for i, step in enumerate(self.next_steps)
        ))
        return "\n\n".join(sections)
```

---

## Compression Triggers

| Threshold | Action |
|-----------|--------|
| 70% capacity | Trigger compression |
| 50% capacity | Target after compression |
| 10 messages minimum | Required before compressing |
| Last 5 messages | Always preserve uncompressed |

### CC 2.1.7: Effective Context Window

Calculate against **effective** context (after system overhead):

| Trigger | Static (CC 2.1.6) | Effective (CC 2.1.7) |
|---------|-------------------|----------------------|
| Warning | 60% of static | 60% of effective |
| Compress | 70% of static | 70% of effective |
| Critical | 90% of static | 90% of effective |

---

## Best Practices

### DO
- Use anchored summarization with forced sections
- Preserve recent messages uncompressed (context continuity)
- Test compression with probes, not similarity metrics
- Merge incrementally (don't regenerate from scratch)
- Track compression count and quality scores

### DON'T
- Compress system prompts (keep at START)
- Use opaque compression for critical workflows
- Compress below the point of task completion
- Trigger compression opportunistically (use fixed thresholds)
- Optimize for compression ratio over task success

---

## Target Metrics

| Metric | Target | Red Flag |
|--------|--------|----------|
| Probe pass rate | >90% | <70% |
| Compression ratio | 60-80% | >95% (too aggressive) |
| Task completion | Same as uncompressed | Degraded |
| Latency overhead | <2s | >5s |

---

## References

For detailed implementation and patterns, see:

- **[Compression Strategies](references/compression-strategies.md)**: Detailed comparison of all strategies (anchored, opaque, regenerative, sliding window), implementation patterns, and decision flowcharts
- **[Priority Management](references/priority-management.md)**: Compression triggers, CC 2.1.7 effective context, probe-based evaluation, OrchestKit integration

## Bundled Resources

- `assets/anchored-summary-template.md` - Template for structured compression summaries with forced sections
- `assets/compression-probes-template.md` - Probe templates for validating compression quality
- `references/compression-strategies.md` - Detailed strategy comparisons
- `references/priority-management.md` - Compression triggers and evaluation

---

## Related Skills

- `context-engineering` - Attention mechanics and positioning
- `memory-systems` - Persistent storage patterns
- `multi-agent-orchestration` - Context isolation across agents
- `observability-monitoring` - Tracking compression metrics

---

**Version:** 1.0.0 (January 2026)
**Key Principle:** Optimize for tokens-per-task, not tokens-per-request
**Recommended Strategy:** Anchored Iterative Summarization with probe-based evaluation

---

## Capability Details

### anchored-summarization
**Keywords:** compress, summarize history, context too long, anchored summary
**Solves:**
- Reduce context size while preserving critical information
- Implement structured compression with required sections
- Maintain session intent and decisions through compression

### compression-triggers
**Keywords:** token limit, running out of context, when to compress
**Solves:**
- Determine when to trigger compression (70% utilization)
- Set compression targets (50% utilization)
- Preserve last 5 messages uncompressed

### probe-evaluation
**Keywords:** evaluate compression, test compression, probe
**Solves:**
- Validate compression quality with functional probes
- Test information preservation after compression
- Achieve >90% probe pass rate

Overview

This skill compresses long conversation context while preserving information critical to task completion. It uses anchored iterative summarization and probe-based validation to reduce token usage without degrading agent performance. The focus is tokens-per-task, not tokens-per-request.

How this skill works

The skill builds structured, forced-section summaries (session intent, files modified, decisions, current state, blockers, next steps) and incrementally merges new information into the anchored summary. Compression is triggered by effective context thresholds and validated via functional probes that confirm retained knowledge. Recent messages are preserved uncompressed and compression metadata (counts, quality scores) is tracked for observability.

When to use it

Long-running sessions approaching model context limits
Multi-step agent workflows where history accumulates across steps
Sessions that include large tool outputs or file diffs
Persistent agents that must manage memory over time
When responses degrade or token costs become prohibitive

Best practices

Use anchored iterative summarization with forced sections to avoid silent data loss
Preserve the last 5 messages uncompressed for context continuity
Trigger compression at 70% effective context, aim for 50% after compression
Validate every compression with probe-based tests; target >90% probe pass rate
Merge incrementally rather than regenerating summaries from scratch
Do not compress system prompts or critical workflow checkpoints

Example use cases

A code-review agent that accumulates diffs and comments over many steps
A planning agent coordinating multi-file changes across a sprint
A persistent support bot that must keep task intent and decisions over weeks
A data-processing pipeline agent that receives large tool outputs repeatedly
A multi-agent orchestration setup where each agent preserves a compact, verifiable history

FAQ

How aggressive should compression be?

Aim for 60–80% compression with anchored iterative summarization; avoid >95% unless storage is the only constraint.

What validation method ensures quality?

Use probe-based functional tests that check task-relevant facts and actions. Similarity metrics alone are insufficient.

When should I preserve uncompressed history?

Always keep the most recent 5 messages uncompressed and never compress system prompts or critical decision checkpoints.