home / skills / ykdojo / claude-code-tips / gha

gha skill

safe

/skills/gha

This skill analyzes GitHub Actions failures to identify root causes, assess flakiness, and surface actionable fixes.

npx playbooks add skill ykdojo/claude-code-tips --skill gha

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.5 KB

---
name: gha
description: Analyze GitHub Actions failures and identify root causes
argument-hint: <url>
---

Investigate this GitHub Actions URL: $ARGUMENTS

Use the gh CLI to analyze this workflow run. Your investigation should:

1. **Get basic info & identify actual failure**:
   - What workflow/job failed, when, and on which commit?
   - CRITICAL: Read the full logs carefully to find what SPECIFICALLY caused the exit code 1
   - Distinguish between warnings/non-fatal errors vs actual failures
   - Look for patterns like "failing:", "fatal:", or script logic that determines when to exit 1
   - If you see both "non-fatal" and "fatal" errors, focus on what actually caused the failure

2. **Check flakiness**: Check the past 10-20 runs of THE EXACT SAME failing job:
   - IMPORTANT: If a workflow has multiple jobs, you must check history for the SPECIFIC JOB that failed, not just the workflow
   - Use `gh run list --workflow=<workflow-name>` to get run IDs, then `gh run view <run-id> --json jobs` to check the specific job's status
   - Is this a one-time failure or recurring pattern for THIS SPECIFIC JOB?
   - What's the success rate for THIS JOB recently?
   - When did THIS JOB last pass?

3. **Identify breaking commit** (if there's a pattern of failures for the specific job):
   - Find the first run where THIS SPECIFIC JOB failed and the last run where it passed
   - Identify the commit that introduced the failure
   - Verify by checking: does THIS JOB fail in ALL runs after that commit? Does it pass in ALL runs before?
   - If verified, report the breaking commit with high confidence

4. **Root cause**: Based on logs, history, and any breaking commit, what's the likely cause?
   - Focus on what ACTUALLY caused the failure (not just any errors you see)
   - Verify your hypothesis against the logs and failure logic

5. **Check for existing fix PRs**: Search for open PRs that might already address this issue:
   - Use `gh pr list --state open --search "<keywords>"` with relevant error messages or file names
   - Check if any open PR modifies the failing file/workflow
   - If a fix PR exists, note it in your report and skip the recommendation section

Write a final report with:
- Summary of failure (what specifically triggered the exit code 1)
- Flakiness assessment (one-time vs recurring, success rate)
- Breaking commit (if identified and verified)
- Root cause analysis (based on the ACTUAL failure trigger)
- Existing fix PR (if found - include PR number and link)
- Recommendation (skip if fix PR already exists)

Overview

This skill analyzes GitHub Actions workflow failures and identifies root causes for exit code 1. It uses the gh CLI to read run metadata, job-level logs, and recent run history to produce a concise, actionable report highlighting the exact failure trigger and remediation steps.

How this skill works

Given a GitHub Actions run URL, the skill fetches workflow and job details with gh, downloads and scans full logs to find the literal error that produced exit code 1, and distinguishes fatal errors from warnings. It then examines the last 10–20 runs of the same job to determine flakiness, identifies a breaking commit when present, and searches open PRs for potential fixes.

When to use it

A workflow run failed with exit code 1 and you need the precise failure trigger.
You want to know whether a job failure is flaky or introduced by a recent change.
You need a verified breaking commit and reproduction window for debugging or bisecting.
Before opening a new fix PR, check whether an existing PR already addresses the failure.
Triage CI failures rapidly during releases or after dependency updates.

Best practices

Always inspect full job logs and search for 'fatal', 'error', 'failing:', or explicit 'exit 1' statements.
Compare the same job across runs (not just workflow-level status) to avoid false flakiness signals.
When identifying a breaking commit, verify that the job consistently fails after that commit and passes before it.
Use targeted gh CLI queries: gh run list --workflow=<workflow-name> and gh run view <run-id> --json jobs.
Search open PRs with error strings or modified workflow/file names before recommending fixes.

Example use cases

A lint/test job stops with exit code 1 after a dependency update — identify the failing test or script assertion.
A deployment step intermittently fails — determine whether the failure is flaky or reproducible.
A new commit causes a specific job to fail — find the breaking commit and confirm scope.
Triage CI failures across multiple branches to decide whether to revert or patch.

FAQ

Do you inspect all jobs in the workflow or only the failing one?

I focus on the specific job that failed. Workflow-level checks are used only to locate the job; history and logs are job-scoped.

How do you determine flakiness?

I check the past 10–20 runs of the exact same job and compute recent success rate, noting intermittent failures versus a steady failing pattern.