home / skills / openai / openai-agents-js / examples-auto-run

examples-auto-run skill

safe

/.codex/skills/examples-auto-run

This skill runs multi-script example starts in auto mode with parallel execution, per-start logs, and start/stop helpers.

npx playbooks add skill openai/openai-agents-js --skill examples-auto-run

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

5.2 KB

---
name: examples-auto-run
description: Run examples:start-all in auto mode with parallel execution, per-script logs, and start/stop helpers.
---

# examples-auto-run

## What it does

- Runs `pnpm build && pnpm -r build-check` first
- Runs `pnpm examples:start-all` in auto-input mode (interactive prompts are auto-answered, HITL/MCP/apply-patch are auto-approved).
- Executes starts in parallel (default concurrency 4) and pipes each start’s stdout/stderr into its own log file under `.tmp/examples-start-logs/`.
- Provides start/stop/status/logs/tail helpers via `run.sh`.
- If the Codex session ends (no disown/nohup), the child processes receive SIGHUP and exit; `stop` is also available to clean up manually.

## Usage

```bash
# Start (auto mode, concurrency=4 by default)
.codex/skills/examples-auto-run/scripts/run.sh start [extra args to examples:start-all]
# If you invoke the skill name alone ($examples-auto-run):
#   - when `.tmp/examples-rerun.txt` exists and is non-empty, it will run `rerun` automatically
#   - otherwise it runs the default `start` command.

# Examples:
.codex/skills/examples-auto-run/scripts/run.sh start --filter basic
.codex/skills/examples-auto-run/scripts/run.sh start --include-server --include-audio

# Check status
.codex/skills/examples-auto-run/scripts/run.sh status

# Stop running job (kills pid from .tmp/examples-auto-run.pid)
.codex/skills/examples-auto-run/scripts/run.sh stop

# List logs (per start script)
.codex/skills/examples-auto-run/scripts/run.sh logs

# Tail latest log
.codex/skills/examples-auto-run/scripts/run.sh tail
.codex/skills/examples-auto-run/scripts/run.sh tail basic__start_hello-world.log

# After a run, build a rerun list from the latest main log (auto-skip list is imported from `scripts/run-example-starts.mjs` and server/audio/external skips are honored)
.codex/skills/examples-auto-run/scripts/run.sh collect
# Rerun only the entries in .tmp/examples-rerun.txt
.codex/skills/examples-auto-run/scripts/run.sh rerun
# Show the current auto-skip list (env or defaults)
.codex/skills/examples-auto-run/scripts/run.sh start --print-auto-skip --dry-run
```

## Defaults (overridable via env)

- `EXAMPLES_INTERACTIVE_MODE=auto`
- `AUTO_APPROVE_MCP=1`, `APPLY_PATCH_AUTO_APPROVE=1`, `AUTO_APPROVE_HITL=1` (set in runner)
- `EXAMPLES_CONCURRENCY=4`
- `EXAMPLES_EXECA_TIMEOUT_MS=300000` (5m)  
  `financial-research-agent` and `computer-use` use 10m inside the script.
- Includes interactive; excludes server/audio/external by default:
  - `EXAMPLES_INCLUDE_INTERACTIVE=1`
  - `EXAMPLES_INCLUDE_SERVER=0`
  - `EXAMPLES_INCLUDE_AUDIO=0`
- `EXAMPLES_INCLUDE_EXTERNAL=0`
  - This means `realtime-*` / `nextjs` (tagged as server/audio) are skipped unless you opt in with `--include-server` / `--include-audio` or the corresponding env flags.
- Auto-skip list: `EXAMPLES_AUTO_SKIP` (comma/space separated) overrides the built-in defaults used by both `run.sh` and `run-example-starts.mjs`. Defaults include `agent-patterns:start:llm-as-a-judge`, `agent-patterns:start:routing`, `customer-service:start`, `connectors:start`, `mcp:start:hosted-mcp-on-approval`, `mcp:start:hosted-mcp-human-in-the-loop`.

## Cancellation / cleanup

- Jobs are backgrounded but not disowned; if Codex suspends/ends the shell, the process group gets SIGHUP and stops.
- Manual cleanup: `run.sh stop` (removes stale pid if already exited).

## Log locations

- `.tmp/examples-start-logs/<package>__<script>.log` (per start)
- Main runner log path is printed when `start` is invoked.
- Rerun list (generated by `collect`): `.tmp/examples-rerun.txt` (one `package:script` per line).

## Notes

- Auto-skip is centralized (same defaults as above) and can be overridden via `EXAMPLES_AUTO_SKIP`. Auto-skip entries are excluded from rerun collection and will be removed from rerun execution automatically.
- Auto-input map covers common interactive prompts; HITL/MCP/apply-patch auto-approve via env is enabled by the runner.
- Shell tool approvals are auto-approved in auto mode (`SHELL_AUTO_APPROVE=1`).
- `rerun` runs entries sequentially, continues after failures, and rewrites `.tmp/examples-rerun.txt` with only the remaining failures. Auto-skip entries are not re-added.
- Behavioral validation is _not_ done in the runner, so **Codex must immediately perform it after every `start` or `rerun` invocation without waiting for the user to ask.** Required steps:
  1. Read the example source to infer intended flow from code/comments (tools invoked, expected outputs, guards, approvals).
  2. Read the matching log under `.tmp/examples-start-logs/`.
  3. Compare intent vs. log: confirm key actions/results happened; flag omissions or divergences.
  4. Do this for **all exit-0 entries**, not just samples.
  5. Summarize findings right after the run completes; when “OK”, note what was checked (e.g., “tools called + final message emitted”).
  6. When reporting, do not omit or ellipsize outputs that justify the validation; include the full relevant lines (keep it concise but untruncated).
- The runner prints a full table after the summary: one row per start script with `status`, `package:script`, `info` (reason/exit/skipped), and the log path. If the run stops before the table appears, point the analyzer at the latest `main_*.log` to reconstruct a table and validations.

Overview

This skill runs examples:start-all in auto mode with parallel execution, per-script logs, and start/stop helpers for a multi-agent TypeScript workspace. It automates build, interactive prompt answers, and common approvals so you can exercise many example starts with minimal manual interaction. Concurrency, timeouts, and inclusion filters are configurable via environment variables or command flags.

How this skill works

The runner first builds the repo (pnpm build && pnpm -r build-check) then invokes pnpm examples:start-all in auto-input mode. Starts execute in parallel (default concurrency 4) and each start’s stdout/stderr are piped to per-script log files under .tmp/examples-start-logs/. A lightweight run.sh script provides start/stop/status/logs/tail/collect/rerun helpers and manages a main runner log and a pid file for cleanup.

When to use it

Automated validation of many example start scripts after code changes or CI runs.
Smoke-testing local multi-agent or voice-agent scenarios that require interactive prompts to be auto-answered.
Collecting failing examples for iterative debugging using the collect → rerun flow.
Running a high-throughput set of starts while keeping separate logs for each script.
Quickly exercising agent examples that normally require human approvals (HITL/MCP/patch).

Best practices

Run from a stable build state; runner invokes build-check but keep dependencies up to date.
Inspect .tmp/examples-start-logs/<package>__<script>.log for per-start evidence when diagnosing failures.
Use EXAMPLES_EXECA_TIMEOUT_MS and EXAMPLES_CONCURRENCY to tune runtime for long-running examples.
Opt into --include-server/--include-audio only when required; defaults skip server/audio/external for speed.
Use collect then rerun to isolate intermittent failures; rerun rewrites .tmp/examples-rerun.txt with remaining failures.

Example use cases

Run all interactive demos after a refactor to ensure no regressions in expected tool calls or final messages.
Gather logs and rerun only failing example starts after CI exposes flaky entries.
Automate routine approval flows (apply-patch, MCP, HITL) during mass example execution.
Tail a specific start log while developing a single example in the context of the full suite.

FAQ

How do I stop a running job if the session ends?

Use run.sh stop to kill the tracked pid and clean up; backgrounded jobs receive SIGHUP if the shell session ends, but stop is available for manual cleanup.

Where are logs and rerun lists stored?

Per-start logs are in .tmp/examples-start-logs/<package>__<script>.log. The main runner log path is printed on start. collect writes .tmp/examples-rerun.txt with one package:script per line.