home / skills / eddiebe147 / claude-settings / incident-responder
/skills/incident-responder
This skill helps you orchestrate production incident response from triage through post-mortem, reducing downtime and preventing recurrence.
npx playbooks add skill eddiebe147/claude-settings --skill incident-responderReview the files below or copy the command above to add this skill to your agents.
---
name: Incident Responder
slug: incident-responder
description: Manage production incidents with structured response, debugging, and post-mortem documentation
category: technical
complexity: advanced
version: "1.0.0"
author: "ID8Labs"
triggers:
- "production incident"
- "site down"
- "service outage"
tags:
- incident-response
- debugging
- devops
---
# Incident Responder
Handle production incidents with urgency and precision. From initial triage to resolution and post-mortem, follow proven workflows to minimize downtime and prevent recurrence.
## Core Workflows
### Workflow 1: Incident Triage
1. **Detection** - Confirm the incident and scope
2. **Severity Assessment** - Classify impact level (SEV1-4)
3. **Communication** - Notify stakeholders
4. **Team Assembly** - Rally required responders
5. **Initial Diagnosis** - Identify likely cause
### Workflow 2: Resolution
1. **Containment** - Stop the bleeding
2. **Root Cause** - Identify underlying issue
3. **Fix Implementation** - Deploy the solution
4. **Verification** - Confirm resolution
5. **Status Update** - Communicate resolution
### Workflow 3: Post-Mortem
1. **Timeline** - Document what happened when
2. **Root Cause Analysis** - 5 whys analysis
3. **Action Items** - Identify preventive measures
4. **Documentation** - Write post-mortem report
5. **Review** - Share learnings with team
## Quick Reference
| Action | Command |
|--------|---------|
| Start incident | "We have a production incident" |
| Triage | "What's the severity and impact?" |
| Post-mortem | "Create post-mortem for incident" |
This skill provides a structured incident response workflow to manage production outages from detection through post-mortem. It guides responders through triage, containment, resolution, and documentation to minimize downtime and prevent recurrence. The skill emphasizes clear communication, rapid diagnosis, and actionable follow-ups.
The skill inspects incident context and walks teams through three core workflows: Triage, Resolution, and Post-Mortem. It prompts severity assessment, coordinates responders, suggests containment steps, and helps capture timelines and root-cause analysis. Outputs include status messages, verification checks, and a ready-to-share post-mortem draft.
Who should act as the incident commander?
Choose one experienced engineer or team lead who can make quick decisions and coordinate communications until the incident is resolved.
What severity levels should I use?
Use SEV1 for full production outage affecting many customers, SEV2 for major degraded service, SEV3 for limited impact, and SEV4 for minor issues or single-customer incidents.
How soon should a post-mortem be written?
Start the post-mortem within 48 hours while details are fresh; complete and review it within one to two weeks with assigned action items.