home / skills / ykdojo / claude-code-tips / gha
This skill analyzes GitHub Actions failures to identify root causes, assess flakiness, and surface actionable fixes.
npx playbooks add skill ykdojo/claude-code-tips --skill ghaReview the files below or copy the command above to add this skill to your agents.
---
name: gha
description: Analyze GitHub Actions failures and identify root causes
argument-hint: <url>
---
Investigate this GitHub Actions URL: $ARGUMENTS
Use the gh CLI to analyze this workflow run. Your investigation should:
1. **Get basic info & identify actual failure**:
- What workflow/job failed, when, and on which commit?
- CRITICAL: Read the full logs carefully to find what SPECIFICALLY caused the exit code 1
- Distinguish between warnings/non-fatal errors vs actual failures
- Look for patterns like "failing:", "fatal:", or script logic that determines when to exit 1
- If you see both "non-fatal" and "fatal" errors, focus on what actually caused the failure
2. **Check flakiness**: Check the past 10-20 runs of THE EXACT SAME failing job:
- IMPORTANT: If a workflow has multiple jobs, you must check history for the SPECIFIC JOB that failed, not just the workflow
- Use `gh run list --workflow=<workflow-name>` to get run IDs, then `gh run view <run-id> --json jobs` to check the specific job's status
- Is this a one-time failure or recurring pattern for THIS SPECIFIC JOB?
- What's the success rate for THIS JOB recently?
- When did THIS JOB last pass?
3. **Identify breaking commit** (if there's a pattern of failures for the specific job):
- Find the first run where THIS SPECIFIC JOB failed and the last run where it passed
- Identify the commit that introduced the failure
- Verify by checking: does THIS JOB fail in ALL runs after that commit? Does it pass in ALL runs before?
- If verified, report the breaking commit with high confidence
4. **Root cause**: Based on logs, history, and any breaking commit, what's the likely cause?
- Focus on what ACTUALLY caused the failure (not just any errors you see)
- Verify your hypothesis against the logs and failure logic
5. **Check for existing fix PRs**: Search for open PRs that might already address this issue:
- Use `gh pr list --state open --search "<keywords>"` with relevant error messages or file names
- Check if any open PR modifies the failing file/workflow
- If a fix PR exists, note it in your report and skip the recommendation section
Write a final report with:
- Summary of failure (what specifically triggered the exit code 1)
- Flakiness assessment (one-time vs recurring, success rate)
- Breaking commit (if identified and verified)
- Root cause analysis (based on the ACTUAL failure trigger)
- Existing fix PR (if found - include PR number and link)
- Recommendation (skip if fix PR already exists)
This skill analyzes GitHub Actions workflow failures and identifies root causes for exit code 1. It uses the gh CLI to read run metadata, job-level logs, and recent run history to produce a concise, actionable report highlighting the exact failure trigger and remediation steps.
Given a GitHub Actions run URL, the skill fetches workflow and job details with gh, downloads and scans full logs to find the literal error that produced exit code 1, and distinguishes fatal errors from warnings. It then examines the last 10–20 runs of the same job to determine flakiness, identifies a breaking commit when present, and searches open PRs for potential fixes.
Do you inspect all jobs in the workflow or only the failing one?
I focus on the specific job that failed. Workflow-level checks are used only to locate the job; history and logs are job-scoped.
How do you determine flakiness?
I check the past 10–20 runs of the exact same job and compute recent success rate, noting intermittent failures versus a steady failing pattern.