home / skills / phrazzld / claude-config / triage
This skill performs multi-source production triage, auditing Sentry, Vercel logs, health endpoints, and CI/CD to guide rapid investigation and fixes.
npx playbooks add skill phrazzld/claude-config --skill triageReview the files below or copy the command above to add this skill to your agents.
---
name: triage
description: |
Multi-source observability triage. Checks Sentry, Vercel logs, health endpoints, GitHub CI/CD.
Drives: investigate -> fix -> PR -> postmortem workflow.
Invoke for: production issues, error spikes, CI failures, user reports, incident response.
argument-hint: "[action: status | investigate ISSUE-ID | investigate-ci RUN-ID | fix | postmortem ISSUE-ID]"
effort: max
---
# /triage
Fix production issues. Run audit, investigate, fix, postmortem.
**This is a fixer.** It uses `/check-production` as its primitive. Use `/log-production-issues` to create issues instead of fixing.
## Usage
```bash
/triage # Audit and fix highest priority (default)
/triage investigate VOL-456 # Deep dive on specific Sentry issue
/triage investigate-ci 12345 # Deep dive on specific CI run failure
/triage fix # Create PR for current fix
/triage postmortem VOL-456 # Generate postmortem after merge
```
## Stage 1: Production Audit
**Command:** `/triage` or `/triage status`
Invoke `/check-production` primitive for parallel checks:
1. **Sentry** - Unresolved issues via triage scripts
2. **Vercel logs** - Recent errors in stream
3. **Health endpoints** - `/api/health` response
4. **GitHub CI/CD** - Failed workflow runs
**Output format:**
```
TRIAGE STATUS - 2026-01-23 15:30
================================
SENTRY (volume-fitness)
[P0] 3 unresolved issues
Top: VOL-456 "PaymentIntent failed" (Score: 147, 23 users)
GITHUB CI/CD
[P1] Main branch failing: "CI" workflow (run #1234)
Failed: Type check - 2h ago
[P2] 2 feature branches blocked
VERCEL LOGS
[OK] No errors in last 10 minutes
HEALTH ENDPOINTS
[OK] volume.fitness/api/health (200, 45ms)
RECOMMENDATION:
1. Investigate VOL-456 immediately - 23 users affected
Run: /triage investigate VOL-456
2. Fix main branch CI - blocking all deploys
Run: /triage investigate-ci 1234
```
If all clean: "All systems nominal. No action required."
## Stage 2: Investigate
### Delegation Pattern
For complex issues, delegate investigation to agentic tools (see `/delegate`):
- **Codex** — Code archaeology, stack trace analysis, debugging
- **Gemini** — Research current patterns, check for known issues
- **Thinktank** — Validate proposed fix before implementing
### Sentry Issues
**Command:** `/triage investigate ISSUE-ID`
Actions:
1. Fetch full issue context from Sentry
2. Create branch: `fix/ISSUE-ID-description`
3. Load affected files from stack trace
4. Check git history for related changes
5. Form root cause hypothesis (delegate to Codex for complex traces)
**Output:** Investigation summary with hypothesis and next steps.
### CI/CD Failures
**Command:** `/triage investigate-ci RUN-ID`
Actions:
1. Fetch failed workflow run details
```bash
gh run view RUN-ID --log-failed
```
2. Identify failed step and error message
3. Create branch: `fix/ci-[workflow-name]-[date]`
4. Load affected files based on error
5. Check recent commits that may have caused regression
**Common CI failure patterns:**
| Failure Type | Typical Cause | Fix Approach |
|--------------|---------------|--------------|
| Type check | New code with type errors | Fix types locally, push |
| Lint | Style violations | Run `pnpm lint --fix` |
| Test | Broken/flaky tests | Run tests locally, fix or skip flaky |
| Build | Missing deps, config issues | Check package.json, build config |
| Deploy | Env vars, permissions | Check Vercel/platform settings |
**Output:** CI investigation summary with specific error and fix approach.
## Stage 3: Fix
**Command:** `/triage fix`
Prerequisites: On `fix/` branch with changes.
Actions:
1. Run tests to verify fix
2. Create PR with standard format
3. Link Sentry issue in PR description
**PR format:**
```markdown
## Summary
[Fix description]
## Sentry Issue
- ID: ISSUE-ID
- Users affected: N
- First seen: DATE
## Test Plan
- [ ] Test case 1
- [ ] Test case 2
```
## Stage 4: Postmortem
**Command:** `/triage postmortem ISSUE-ID`
Prerequisites: Fix deployed (PR merged).
Actions:
1. Verify no new errors in Sentry
2. Generate postmortem document from template
3. Resolve Sentry issue
4. Create `docs/postmortems/YYYY-MM-DD-ISSUE-ID.md`
## Scripts
### Via Sentry MCP (Preferred)
When Sentry MCP is configured, use direct queries:
- "Show me unresolved errors in production"
- "What's the triage score for issue VOL-456?"
- "Get full context for the top error"
### Via CLI Scripts
```bash
# Multi-source orchestrator
~/.claude/skills/triage/scripts/check_all_sources.sh
# Individual checks
~/.claude/skills/triage/scripts/check_sentry.sh
~/.claude/skills/triage/scripts/check_vercel_logs.sh
~/.claude/skills/triage/scripts/check_health_endpoints.sh
# Sentry CLI directly
sentry-cli issues list --project=$SENTRY_PROJECT --status=unresolved
sentry-cli issues describe ISSUE-ID
# Postmortem generator
~/.claude/skills/triage/scripts/generate_postmortem.sh ISSUE-ID
```
### Via GitHub CLI
```bash
# List failed runs on main branch
gh run list --branch main --status failure --limit 10
# List all recent failures
gh run list --status failure --limit 10
# View failed run details
gh run view RUN-ID
# View only failed step logs
gh run view RUN-ID --log-failed
# Re-run failed jobs (after fix pushed)
gh run rerun RUN-ID --failed
# Watch a run in progress
gh run watch RUN-ID
```
## Workflow
```
/triage
|
v
[Issues found?]
|
+-- Sentry issue --> /triage investigate ISSUE-ID
| |
| v
| [Fix locally]
| |
| v
| /triage fix (creates PR)
| |
| v
| [PR merged & deployed]
| |
| v
| /triage postmortem ISSUE-ID
|
+-- CI failure --> /triage investigate-ci RUN-ID
| |
| v
| [Fix locally, push]
| |
| v
| [CI re-runs automatically]
| |
| v
| [Verify CI green]
|
+-- No issues --> "All systems nominal"
```
## Environment Variables
```bash
# Required for Sentry
SENTRY_AUTH_TOKEN # or SENTRY_MASTER_TOKEN
SENTRY_ORG # Organization slug
# Auto-detected per project
SENTRY_PROJECT # From .sentryclirc or .env.local
# Optional for Vercel
VERCEL_TOKEN # For `vercel logs` access
```
## MCP Configuration (Recommended)
For AI-assisted triage, configure Sentry MCP:
```json
{
"mcpServers": {
"sentry": {
"url": "https://mcp.sentry.dev/mcp",
"transport": "http"
}
}
}
```
Or local with token:
```json
{
"mcpServers": {
"sentry": {
"command": "npx",
"args": ["-y", "@sentry/mcp-server"],
"env": {
"SENTRY_AUTH_TOKEN": "your-token",
"SENTRY_ORG": "your-org"
}
}
}
}
```
## Reuses
- `~/.claude/skills/sentry-observability/scripts/triage_score.sh`
- `~/.claude/skills/sentry-observability/scripts/issue_detail.sh`
- `~/.claude/skills/sentry-observability/scripts/resolve_issue.sh`
## Related
- `/check-production` - The primitive (audit only)
- `/log-production-issues` - Create GitHub issues from findings
- `/observability` - Full observability setup
- `/sentry-observability` - Sentry-specific operations
- `/verify-fix` - Verification checklist
- `/delegate` - Multi-AI orchestration pattern
This skill performs multi-source observability triage for production incidents, combining Sentry, Vercel logs, health endpoints, and GitHub CI/CD checks. It drives an investigate -> fix -> PR -> postmortem workflow so teams can move from detection to resolution and learning. Use it to prioritize issues, run targeted investigations, create fixes, and generate postmortems after deployment.
The skill runs a parallel production audit that queries Sentry for unresolved issues, tails Vercel logs for recent errors, checks health endpoints, and lists failed GitHub workflow runs. For a selected finding it creates a working branch, loads stack traces and related files, inspects recent commits, and produces an investigation summary with a root-cause hypothesis. When a fix is prepared it runs tests, opens a standardized PR, and after merge generates a postmortem and resolves the Sentry issue.
What permissions are required?
Tokens for Sentry (SENTRY_AUTH_TOKEN), optional Vercel token for logs, and GitHub CLI permissions to view and create runs/PRs are required.
How does it decide priority?
Priority is driven by triage score from Sentry, user impact counts, and CI failure severity; the audit recommends actions based on those signals.