home / skills / petekp / agent-skills / optimize-agent-docs

optimize-agent-docs skill

/skills/optimize-agent-docs

This skill optimizes agent documentation for fast retrieval by building a knowledge manifest, dense compiled artifacts, and task-specific load maps.

npx playbooks add skill petekp/agent-skills --skill optimize-agent-docs

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
7.2 KB
---
name: optimize-agent-docs
description: Build a retrieval-optimized knowledge layer over agent documentation in dotfiles (.claude, .codex, .cursor, .aider). Use when asked to "optimize docs", "improve agent knowledge", "make docs more efficient", or when documentation has accumulated and retrieval feels inefficient. Generates a manifest mapping task-contexts to knowledge chunks, optimizes information density, and creates compiled artifacts for efficient agent consumption.
---

# Agent Knowledge Optimizer

Transform accumulated documentation into a retrieval-optimized knowledge system.

## Core Principle

File organization is a human concern. Agents don't browse—they search and load. Optimize for:
- **Discovery**: What knowledge exists?
- **Relevance**: Is it needed for this task?
- **Efficiency**: What's the minimum to load?

## Workflow

### Phase 1: Knowledge Extraction

Inventory all agent documentation:

```bash
# Find all agent doc sources
find . -maxdepth 2 -name "*.md" -path "*/.claude/*" -o \
       -name "*.md" -path "*/.codex/*" -o \
       -name "*.md" -path "*/.cursor/*" -o \
       -name "CLAUDE.md" -o -name "AGENTS.md" -o -name "INSTRUCTIONS.md"
```

For each file, extract:
- Discrete facts (single pieces of actionable information)
- Instructions (procedures, rules, constraints)
- Context triggers (when is this knowledge needed?)

### Phase 2: Chunk Analysis

Break content into **retrieval units**—the smallest self-contained piece of information that makes sense alone.

Good chunk:
```
## Adding API Endpoints
1. Create handler in src/handlers/
2. Register route in src/routes.rs
3. Add OpenAPI spec to docs/api.yaml
```

Bad chunk (too coupled):
```
See the API section for endpoint patterns, but first read the auth docs,
which reference the middleware guide...
```

Score each chunk:
- **Self-contained?** Can agent act on this without loading more?
- **Task-specific?** Clear when this is needed?
- **Information-dense?** High signal per token?

### Phase 3: Build Knowledge Manifest

Generate `.claude/KNOWLEDGE.md`—a lightweight index the agent reads first:

```markdown
# Knowledge Manifest

## Task → Knowledge Map

| When working on... | Load | Key terms |
|-------------------|------|-----------|
| API endpoints | references/api.md | route, handler, endpoint |
| Authentication | references/auth.md | token, session, login |
| Database changes | references/schema.md | migration, model, query |
| Testing | references/testing.md | spec, fixture, mock |
| Deployment | references/deploy.md | release, staging, prod |

## Quick Reference

### Build Commands
- `npm run dev` — Start dev server (port 3000)
- `npm test` — Run test suite
- `npm run build` — Production build

### Key Paths
- Handlers: `src/handlers/`
- Routes: `src/routes.ts`
- Tests: `tests/`

### Critical Rules
- Never commit .env files
- All PRs require tests
- Use conventional commits
```

The manifest contains:
1. **Task→Knowledge map**: What to load for what context
2. **Quick reference**: High-frequency facts (no file loading needed)
3. **Critical rules**: Must-know constraints (always relevant)

### Phase 4: Compile Optimized Artifacts

Transform verbose source docs into dense, agent-optimized versions.

**Compression techniques:**

| Source (verbose) | Compiled (dense) |
|-----------------|------------------|
| "When you want to add a new endpoint, you should first create a handler function..." | `New endpoint: handler → route → spec` |
| Long prose paragraphs | Structured tables |
| Repeated information | Single source of truth |
| Examples with explanation | Just the pattern |

**Output structure:**

```
.claude/
├── CLAUDE.md              # Human-readable, can stay verbose
├── KNOWLEDGE.md           # Agent manifest (generated)
└── compiled/              # Agent-optimized versions (generated)
    ├── api.md             # Dense API reference
    ├── patterns.md        # Code patterns as templates
    └── rules.md           # All constraints in one place
```

### Phase 5: Generate Retrieval Hints

Add grep-friendly markers throughout compiled docs:

```markdown
<!-- @task:new-endpoint @load:api,routes -->
## Adding Endpoints

<!-- @task:fix-auth @load:auth,middleware -->
## Authentication Flow

<!-- @task:write-test @load:testing -->
## Test Patterns
```

These markers enable:
```bash
# Find relevant sections for a task
grep -l "@task:new-endpoint" .claude/compiled/*.md
```

### Phase 6: Validation

Test the optimized system:

1. **Coverage check**: Every fact from source exists in compiled output
2. **Retrieval test**: Can common tasks be served with minimal loading?
3. **Density check**: Compiled versions smaller than sources?

```bash
# Compare sizes
wc -l .claude/references/*.md    # Source
wc -l .claude/compiled/*.md       # Compiled (should be smaller)
```

## Manifest Format

The `KNOWLEDGE.md` manifest follows this structure:

```markdown
# Knowledge Manifest
<!-- Auto-generated. Source: .claude/references/, CLAUDE.md -->

## Task Context Map
<!-- What to load based on current work -->

| Context | Load | Search |
|---------|------|--------|
| [task description] | [file path] | [grep terms] |

## Always-Loaded Facts
<!-- High-frequency, never needs file lookup -->

### Commands
[Most-used commands as a table]

### Paths
[Key directories and their purposes]

### Rules
[Critical constraints that always apply]

## Chunk Index
<!-- What exists and where -->

| Topic | Location | Lines | Summary |
|-------|----------|-------|---------|
| [topic] | [file:line-range] | [count] | [one-line summary] |
```

## Information Density Principles

### Convert Prose to Structure

Before:
> "The authentication system uses JWT tokens stored in httpOnly cookies.
> When a user logs in, the server validates credentials against the database,
> generates a token with a 24-hour expiry, and sets it as a cookie..."

After:
```
## Auth Flow
- Method: JWT in httpOnly cookie
- Expiry: 24h
- Flow: credentials → DB validate → token → cookie
```

### Eliminate Redundancy

If the same information appears in multiple places, create one canonical source and reference it:

```markdown
## Token Handling
See: [Auth Flow](#auth-flow) — tokens section
```

### Prefer Tables Over Lists

Before:
```markdown
- The API endpoint for users is /api/users
- The API endpoint for posts is /api/posts
- The API endpoint for comments is /api/comments
```

After:
```markdown
| Resource | Endpoint |
|----------|----------|
| Users | /api/users |
| Posts | /api/posts |
| Comments | /api/comments |
```

### Use Patterns Over Examples

Before:
```markdown
To create a user handler:
```javascript
export async function createUser(req, res) {
  const { name, email } = req.body;
  const user = await db.users.create({ name, email });
  res.json(user);
}
```

After:
```markdown
Handler pattern: `export async function {action}{Resource}(req, res)`
Body: Extract params → DB operation → Return result
```

## Output Checklist

After optimization, verify:

- [ ] `KNOWLEDGE.md` exists and is under 100 lines
- [ ] Task→knowledge mappings cover common workflows
- [ ] Quick reference has most-used facts
- [ ] Compiled docs are denser than sources
- [ ] No orphaned knowledge (everything indexed)
- [ ] Retrieval hints enable grep-based discovery
- [ ] Original source docs untouched (human reference)

Overview

This skill builds a retrieval-optimized knowledge layer over agent documentation stored in dotfiles (.claude, .codex, .cursor, .aider). It inventories source docs, breaks them into compact retrieval units, and generates a lightweight manifest plus compiled artifacts for fast agent consumption. The result is faster, smaller loads and clearer task→knowledge mappings for agents.

How this skill works

The tool scans agent doc locations, extracts discrete facts, instructions, and context triggers, then splits content into self-contained chunks scored by usefulness. It produces a KNOWLEDGE.md manifest mapping task contexts to the minimal files or chunks to load and compiles dense, grep-friendly documents in a compiled/ directory. Validation checks ensure coverage, smaller compiled size, and retrieval efficiency.

When to use it

  • Documentation has grown messy and agents take too long to find info
  • You want agents to load only the minimal context for a task
  • Preparing a repo for automated agent workflows or CI-driven assistance
  • Consolidating repeated rules and high-frequency facts into a single reference
  • Onboarding new agents or agent versions that rely on compact knowledge

Best practices

  • Extract facts as smallest self-contained chunks that an agent can act on
  • Score chunks for self-containment, task-specificity, and information density
  • Keep KNOWLEDGE.md under ~100 lines with Task→Knowledge mappings and quick facts
  • Use grep-friendly markers (e.g., @task:@load:) in compiled docs for fast discovery
  • Preserve original verbose sources for humans; generate compiled artifacts for agents
  • Run coverage and retrieval tests after each compilation step

Example use cases

  • Optimize a repo so a code-generation agent only loads routing and handler patterns when adding endpoints
  • Create a compact auth reference so an assistant can fix login flows without loading full docs
  • Generate a quick task→file map for triaging database migrations or schema changes
  • Compress long design and UI instructions into patterns for iterative UI agent workflows
  • Add grep-marked compiled docs to enable fast, scriptable retrieval in CI

FAQ

Will this change the human-readable docs?

No. Original sources remain untouched; the skill generates separate compiled artifacts and a manifest for agent use.

How do agents find the right chunks for a task?

KNOWLEDGE.md maps task contexts to files, and compiled docs include grep-friendly @task/@load markers so agents can quickly locate relevant sections.