home / skills / doanchienthangdev / omgkit / token-optimization

This skill helps you maximize efficiency and minimize costs in AI interactions by optimizing prompts, context, and model usage.

npx playbooks add skill doanchienthangdev/omgkit --skill token-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.4 KB
---
name: optimizing-tokens
description: AI agent maximizes efficiency and minimizes costs through strategic token usage while maintaining output quality. Use when managing AI interactions, designing prompts, or reducing costs.
---

# Optimizing Tokens

## Quick Start

1. **Analyze** - Identify input vs output token distribution
2. **Minimize Context** - Read only relevant file sections, not entire files
3. **Optimize Prompts** - Use direct commands, remove filler words
4. **Structure Outputs** - Request concise formats (JSON over prose)
5. **Batch Operations** - Combine related requests, avoid duplicate context
6. **Select Model** - Match model tier to task complexity

## Features

| Feature | Description | Guide |
|---------|-------------|-------|
| Context Targeting | Read only needed code sections | Line ranges, pattern search, summaries |
| Prompt Efficiency | Direct commands vs verbose requests | 79% reduction possible |
| Output Formatting | Structured concise responses | JSON/YAML over verbose explanations |
| Model Selection | Right model for task complexity | Haiku: simple, Sonnet: standard, Opus: complex |
| Batching | Combine related operations | Single request with multiple outputs |
| Caching | Avoid redundant computation | Cache by content hash + timestamp |

## Common Patterns

```
# Prompt Optimization (79% reduction)
INEFFICIENT (120 tokens):
"I would really appreciate it if you could help me
with this task. What I need you to do is to please
analyze this code and look for any bugs..."

EFFICIENT (25 tokens):
"Analyze for bugs, error handling issues, security.
For each: location, problem, fix."

# Context Optimization
INEFFICIENT: Read entire 1000-line file
EFFICIENT: Read lines 45-60 around target function

# Output Format
INEFFICIENT: "Please explain in detail..."
EFFICIENT: "Output: JSON {name, severity, fix}"

# Batching
INEFFICIENT:
  Request 1: "Given code [100 lines], find bugs"
  Request 2: "Given code [same 100 lines], add types"

EFFICIENT:
  Single request: "Given code [100 lines]:
  1. Find bugs
  2. Add types"
```

```
# Model Selection Guide
| Task Type | Model | Examples |
|-----------|-------|----------|
| Simple | Haiku | Formatting, syntax check, lookups |
| Standard | Sonnet | Features, bugs, reviews, tests |
| Complex | Opus | Architecture, security, critical code |

# Search Efficiency
INEFFICIENT: grep ".*" / (matches everything)
EFFICIENT: grep "handleAuth" src/ --type ts
```

## Best Practices

| Do | Avoid |
|----|-------|
| Read only what's needed - use line ranges | Reading entire files for one function |
| Use direct language - commands over requests | Verbose, polite phrasing in prompts |
| Structure outputs - JSON/YAML over prose | Requesting detailed explanations for simple tasks |
| Batch operations - combine related requests | Repeating context across multiple requests |
| Choose right model - Haiku for simple tasks | Using most powerful model for everything |
| Limit search results - use head_limit | Unbounded searches returning thousands of results |
| Cache results - avoid redundant computation | Re-analyzing unchanged files |
| Progressive loading - start minimal, expand | Loading full context when partial suffices |

## Related Skills

- `dispatching-parallel-agents` - Efficient multi-agent patterns
- `writing-plans` - Structured planning reduces iteration
- `thinking-sequentially` - Organized reasoning saves tokens

Overview

This skill maximizes efficiency and reduces cost by minimizing token usage while preserving output quality. It provides practical techniques for targeting context, tightening prompts, batching requests, and selecting the right model tier. Use it to streamline AI interactions, lower API spend, and speed up iteration.

How this skill works

The skill inspects token distribution between inputs and outputs, then applies targeted strategies: read only relevant file sections, rewrite prompts into direct commands, and request structured outputs (JSON/YAML) to avoid verbose prose. It also recommends batching related tasks, caching repeated results, and matching model complexity to the job to avoid overprovisioning.

When to use it

  • When interacting with large codebases where reading whole files is unnecessary
  • When designing prompts for repeated or production workflows to lower costs
  • When preparing multi-step requests that can be combined into one call
  • When choosing a model tier to balance cost and capability
  • When caching or reusing previous analysis to avoid redundant token spend

Best practices

  • Target context precisely using line ranges or pattern searches instead of full files
  • Write direct, command-style prompts; remove filler and polite phrasing
  • Request structured, concise outputs (JSON/YAML) rather than freeform explanations
  • Batch related tasks into a single request to avoid duplicate context
  • Select the simplest model that satisfies requirements and cache results by content hash

Example use cases

  • Bug triage: read only the function lines and ask for a JSON list of issues and fixes
  • Refactor request: supply a single file excerpt and request both type additions and tests in one call
  • Security review: run a concise checklist across multiple modules in a batched request
  • Formatting and linting: use a lightweight model for syntax fixes and a standard model only for semantic reviews
  • CI integration: cache analysis outputs and rerun only on changed hashes to reduce repeated spend

FAQ

How much token reduction can I expect?

Results vary, but targeted prompts and context can cut token usage substantially—examples show reductions around 50–80% for common tasks.

When should I still use a powerful model?

Use higher-tier models for architecture, critical security reviews, or complex reasoning that simpler models cannot reliably handle.

How do I decide what context to include?

Start minimal: include the function or lines around the target, then progressively add surrounding code only if the model needs more information.