home / skills / softaworks / agent-toolkit / codex

codex skill

/skills/codex

This skill helps you run Codex CLI tasks and analyze code with GPT-5.2 for efficient editing, refactoring, and automated coding workflows.

npx playbooks add skill softaworks/agent-toolkit --skill codex

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
5.1 KB
---
name: codex
description: Use when the user asks to run Codex CLI (codex exec, codex resume) or references OpenAI Codex for code analysis, refactoring, or automated editing. Uses GPT-5.2 by default for state-of-the-art software engineering.
---

# Codex Skill Guide

## Running a Task
1. Default to `gpt-5.2` model. Ask the user (via `AskUserQuestion`) which reasoning effort to use (`xhigh`,`high`, `medium`, or `low`). User can override model if needed (see Model Options below).
2. Select the sandbox mode required for the task; default to `--sandbox read-only` unless edits or network access are necessary.
3. Assemble the command with the appropriate options:
   - `-m, --model <MODEL>`
   - `--config model_reasoning_effort="<high|medium|low>"`
   - `--sandbox <read-only|workspace-write|danger-full-access>`
   - `--full-auto`
   - `-C, --cd <DIR>`
   - `--skip-git-repo-check`
3. Always use --skip-git-repo-check.
4. When continuing a previous session, use `codex exec --skip-git-repo-check resume --last` via stdin. When resuming don't use any configuration flags unless explicitly requested by the user e.g. if he species the model or the reasoning effort when requesting to resume a session. Resume syntax: `echo "your prompt here" | codex exec --skip-git-repo-check resume --last 2>/dev/null`. All flags have to be inserted between exec and resume.
5. **IMPORTANT**: By default, append `2>/dev/null` to all `codex exec` commands to suppress thinking tokens (stderr). Only show stderr if the user explicitly requests to see thinking tokens or if debugging is needed.
6. Run the command, capture stdout/stderr (filtered as appropriate), and summarize the outcome for the user.
7. **After Codex completes**, inform the user: "You can resume this Codex session at any time by saying 'codex resume' or asking me to continue with additional analysis or changes."

### Quick Reference
| Use case | Sandbox mode | Key flags |
| --- | --- | --- |
| Read-only review or analysis | `read-only` | `--sandbox read-only 2>/dev/null` |
| Apply local edits | `workspace-write` | `--sandbox workspace-write --full-auto 2>/dev/null` |
| Permit network or broad access | `danger-full-access` | `--sandbox danger-full-access --full-auto 2>/dev/null` |
| Resume recent session | Inherited from original | `echo "prompt" \| codex exec --skip-git-repo-check resume --last 2>/dev/null` (no flags allowed) |
| Run from another directory | Match task needs | `-C <DIR>` plus other flags `2>/dev/null` |

## Model Options

| Model | Best for | Context window | Key features |
| --- | --- | --- | --- |
| `gpt-5.2-max` | **Max model**: Ultra-complex reasoning, deep problem analysis | 400K input / 128K output | 76.3% SWE-bench, adaptive reasoning, $1.25/$10.00 |
| `gpt-5.2` ⭐ | **Flagship model**: Software engineering, agentic coding workflows | 400K input / 128K output | 76.3% SWE-bench, adaptive reasoning, $1.25/$10.00 |
| `gpt-5.2-mini` | Cost-efficient coding (4x more usage allowance) | 400K input / 128K output | Near SOTA performance, $0.25/$2.00 |
| `gpt-5.1-thinking` | Ultra-complex reasoning, deep problem analysis | 400K input / 128K output | Adaptive thinking depth, runs 2x slower on hardest tasks |

**GPT-5.2 Advantages**: 76.3% SWE-bench (vs 72.8% GPT-5), 30% faster on average tasks, better tool handling, reduced hallucinations, improved code quality. Knowledge cutoff: September 30, 2024.

**Reasoning Effort Levels**:
- `xhigh` - Ultra-complex tasks (deep problem analysis, complex reasoning, deep understanding of the problem)
- `high` - Complex tasks (refactoring, architecture, security analysis, performance optimization)
- `medium` - Standard tasks (refactoring, code organization, feature additions, bug fixes)
- `low` - Simple tasks (quick fixes, simple changes, code formatting, documentation)

**Cached Input Discount**: 90% off ($0.125/M tokens) for repeated context, cache lasts up to 24 hours.

## Following Up
- After every `codex` command, immediately use `AskUserQuestion` to confirm next steps, collect clarifications, or decide whether to resume with `codex exec resume --last`.
- When resuming, pipe the new prompt via stdin: `echo "new prompt" | codex exec resume --last 2>/dev/null`. The resumed session automatically uses the same model, reasoning effort, and sandbox mode from the original session.
- Restate the chosen model, reasoning effort, and sandbox mode when proposing follow-up actions.

## Error Handling
- Stop and report failures whenever `codex --version` or a `codex exec` command exits non-zero; request direction before retrying.
- Before you use high-impact flags (`--full-auto`, `--sandbox danger-full-access`, `--skip-git-repo-check`) ask the user for permission using AskUserQuestion unless it was already given.
- When output includes warnings or partial results, summarize them and ask how to adjust using `AskUserQuestion`.

## CLI Version

Requires Codex CLI v0.57.0 or later for GPT-5.2 model support. The CLI defaults to `gpt-5.2` on macOS/Linux and `gpt-5.2` on Windows. Check version: `codex --version`

Use `/model` slash command within a Codex session to switch models, or configure default in `~/.codex/config.toml`.

Overview

This skill provides a controlled, opinionated interface for running the Codex CLI (codex exec, codex resume) to perform code analysis, automated edits, and refactoring using GPT-5.2 by default. It encodes safe sandbox defaults, model and reasoning-effort selection, and command assembly patterns so agents run Codex reliably and securely. The skill emphasizes predictable output capture and clear follow-up prompts for iterative workflows.

How this skill works

The skill builds codex exec commands with required flags, defaulting to gpt-5.2 and --sandbox read-only unless edits or network access are requested. It appends 2>/dev/null to suppress thinking tokens by default, captures stdout/stderr, summarizes outcomes, and always offers a resume path. It asks the user for the reasoning-effort level and for explicit permission before using high-impact flags or dangerous sandboxes.

When to use it

  • When the user asks to run Codex CLI commands (codex exec, codex resume).
  • When requesting code review, automated refactoring, or large-scale edits powered by OpenAI Codex/GPT-5.2.
  • When you need an agent-managed, repeatable workflow for editing or analyzing a repository.
  • When you want safe defaults (read-only) but may escalate to workspace-write or danger-full-access with approval.
  • When resuming prior Codex sessions or piping follow-up prompts into an active session.

Best practices

  • Default to gpt-5.2 and ask the user for reasoning-effort (xhigh/high/medium/low) before execution.
  • Use --sandbox read-only for analysis; require explicit consent for --sandbox workspace-write or danger-full-access.
  • Always include --skip-git-repo-check and append 2>/dev/null to suppress thinking tokens unless debugging.
  • When resuming, pipe the prompt via stdin and do not add config flags unless the user requests changes.
  • After each run, summarize results clearly and immediately ask whether to resume, change model, or escalate sandbox or reasoning effort.

Example use cases

  • Run a repository security audit in read-only mode and summarize findings.
  • Apply a multi-file refactor with --sandbox workspace-write and --full-auto after user approval.
  • Resume a previously interrupted Codex session by piping a new prompt into codex exec resume --last.
  • Run an expensive architectural analysis with xhigh reasoning effort on gpt-5.2-max when requested.
  • Execute quick linting, formatting, or documentation updates with low reasoning effort and read-only verification.

FAQ

What model and reasoning effort should I choose?

Default to gpt-5.2 and ask for a reasoning-effort level; use xhigh for ultra-complex tasks, high for complex engineering, medium for standard work, and low for quick fixes.

Why append 2>/dev/null to commands?

Appending 2>/dev/null hides Codex thinking tokens (stderr) to reduce noise; only show stderr if the user requests debugging or explicit visibility.

How do I safely allow edits or network access?

Request user permission before using --sandbox workspace-write or --sandbox danger-full-access and confirm use of --full-auto; document the intended scope before running.