home / skills / jmerta / codex-skills / ci-fix

ci-fix skill

/ci-fix

This skill helps you diagnose and fix GitHub Actions CI failures using gh, enabling quick, minimal-diff patches and verified reruns.

npx playbooks add skill jmerta/codex-skills --skill ci-fix

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
3.0 KB
---
name: ci-fix
description: "Fix GitHub Actions CI failures using GitHub CLI (gh): inspect runs/logs, identify root cause, patch workflows/code, rerun jobs, and summarize verification. Use when GitHub Actions CI is failing or needs diagnosis."
---

# CI fix (GitHub Actions)

## Goal
- Get CI green quickly with minimal, reviewable diffs.
- Use `gh` to locate failing runs, inspect logs/artifacts, rerun jobs, and confirm the fix.

## Inputs to ask for (if missing)
- Repo (`OWNER/REPO`) and whether this is a PR or branch build.
- Failing run URL/ID (or PR number / branch name).
- What "green" means (required workflows? allowed flaky reruns?).
- Any constraints (no workflow edits, no permission changes, no force-push, etc.).

## Workflow (checklist)
1) Confirm `gh` context
   - Auth: `gh auth status`
   - Repo: `gh repo view --json nameWithOwner -q .nameWithOwner`
   - If needed, add `-R OWNER/REPO` to all commands.
   - If `gh` is not installed or not authenticated, tell the user and ask whether to install/authenticate or proceed by pasting logs/run URLs manually.
2) Find the failing run
   - If you have a run URL, extract the run ID: `.../actions/runs/<id>`.
   - Otherwise:
     - Recent failures: `gh run list --limit 20 --status failure`
     - Branch failures: `gh run list --branch <branch> --limit 20 --status failure`
     - Workflow failures: `gh run list -w <workflow> --limit 20 --status failure`
   - Open in browser: `gh run view <id> --web`
3) Pull the signal from logs
   - Job/step overview: `gh run view <id> --verbose`
   - Failed steps only: `gh run view <id> --log-failed`
   - Full log for a job: `gh run view <id> --log --job <job-id>`
   - Download artifacts: `gh run download <id> -D .artifacts/<id>`
4) Identify root cause (prefer the smallest fix)
   - Use `references/ci-failure-playbook.md` for common patterns and safe fixes.
   - Prefer: deterministic code/config fix > workflow plumbing fix > rerun flake.
5) Implement the fix (minimal diff)
   - Update code/tests/config and/or `.github/workflows/*.yml`.
   - Keep changes scoped to the failing job/step.
   - If changing triggers/permissions/secrets, call out risk and get explicit confirmation.
6) Verify in GitHub Actions
   - Rerun only failures: `gh run rerun <id> --failed`
   - Rerun a specific job (note: job **databaseId**): `gh run view <id> --json jobs --jq '.jobs[] | {name,databaseId,conclusion}'`
   - Watch until done: `gh run watch <id> --compact --exit-status`
   - Manually trigger: `gh workflow run <workflow> --ref <branch>`

## Safety notes
- Avoid `pull_request_target` (and any change that runs untrusted fork code with secrets) unless the user explicitly requests it and understands the security tradeoffs.
- Keep workflow `permissions:` least-privilege; don’t broaden token access “just to make it pass”.

## Deliverable (paste in chat / PR)
- **Summary:** ...
- **Failing run:** <link/id> (job/step)
- **Root cause:** ...
- **Fix:** ...
- **Verification:** commands + new run link/id
- **Notes/risks:** ...

Overview

This skill fixes failing GitHub Actions CI using the GitHub CLI (gh). It inspects failing runs and logs, identifies the smallest safe root cause, applies minimal patches to code or workflows, reruns jobs, and summarizes verification and risks. The goal is to get CI green quickly with reviewable diffs and minimal permission changes.

How this skill works

The skill uses gh to locate failing runs (by run ID, PR, or branch), fetch verbose logs and artifacts, and isolate failing jobs and steps. It applies a diagnostic checklist to pick the smallest durable fix (code/test/config first, then workflow plumbing, then rerun for flakes), makes targeted edits, and re-runs the specific failed jobs with gh commands to verify the fix. Finally it produces a concise deliverable summary for the PR or issue.

When to use it

  • A GitHub Actions workflow or PR build is failing and you need a quick, reviewable fix.
  • You need to diagnose intermittent failures and determine if they’re flakiness or real bugs.
  • You want to rerun only failed jobs or steps without triggering full pipelines.
  • You must patch workflow YAML or tests while keeping diffs minimal and permission changes explicit.

Best practices

  • Confirm gh is installed and authenticated; add -R OWNER/REPO when working outside repo context.
  • Prefer deterministic code/config fixes over workflow permission broadening or run-level hacks.
  • Limit changes to the failing job/step and document any security-sensitive edits.
  • Avoid pull_request_target or changes that expose secrets to untrusted fork code unless explicitly approved.
  • Rerun only failed jobs (--failed) or specific jobs by databaseId to minimize noise.

Example use cases

  • A unit test regressed on a branch — inspect failing job logs, fix test or code, rerun the job, confirm green.
  • A flaky integration test — identify transient failure, add retry or stabilize environment, rerun failed job.
  • Workflow permission error — adjust least-privilege permissions for a single job and document risk.
  • Missing artifact or cache issue — download artifacts, repair cache keys or restore missing files, rerun workflow.

FAQ

What commands find the failing runs?

Use gh run list with --status failure and optional --branch or -w workflow. If you have a URL, extract the run ID from .../actions/runs/<id>.

How do I rerun only the failures?

Run gh run rerun <id> --failed. To rerun a specific job, get its databaseId from gh run view <id> --json jobs and rerun accordingly or re-trigger the workflow for a branch/ref.