home / skills / williamzujkowski / cognitive-toolworks / resilience-incident-generator

resilience-incident-generator skill

safe

This skill generates compliant incident response playbooks and runbooks to accelerate response across security incidents, outages, and disasters.

npx playbooks add skill williamzujkowski/cognitive-toolworks --skill resilience-incident-generator

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

13.8 KB

---
name: "Incident Response Playbook Generator"
slug: "resilience-incident-generator"
description: "Generate incident response playbooks for security incidents, outages, and disaster recovery with NIST SP 800-61 compliance and escalation paths."
capabilities:
  - Security incident response playbook generation
  - Production outage runbook creation
  - Disaster recovery scenario planning
  - Escalation matrix design
  - Post-mortem template generation
  - NIST SP 800-61 lifecycle compliance
  - On-call rotation and paging integration
  - Communication plan templates
inputs:
  - incident_type: "security | outage | disaster-recovery | data-breach | ransomware | ddos | service-degradation (string)"
  - severity_level: "P0 (critical) | P1 (high) | P2 (medium) | P3 (low) (string, default: P1)"
  - service_context: "service name, architecture, dependencies (object, optional)"
  - compliance_requirements: "NIST, SOC2, HIPAA, PCI-DSS (array, optional)"
  - tier: "T1 (template) | T2 (detailed playbook) (string, default: T1)"
outputs:
  - playbook: "NIST SP 800-61 structured playbook with phases"
  - escalation_matrix: "contact list with escalation thresholds"
  - runbook: "step-by-step remediation procedures"
  - post_mortem_template: "structured incident report template"
  - communication_plan: "stakeholder notification templates"
keywords:
  - incident-response
  - disaster-recovery
  - playbook
  - runbook
  - nist-800-61
  - security-incident
  - outage
  - escalation
  - post-mortem
  - on-call
version: "1.0.0"
owner: "cognitive-toolworks"
license: "MIT"
security: "Public; no secrets or PII; safe for open repositories"
links:
  - https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final
  - https://www.atlassian.com/incident-management/incident-response
  - https://response.pagerduty.com/
  - https://www.sans.org/white-papers/33901/
  - https://cloud.google.com/architecture/incident-response
  - https://incidentresponse.com/playbooks/
---

## Purpose & When-To-Use

**Trigger conditions:**
- Security incident detected (malware, data breach, unauthorized access, ransomware, DDoS)
- Production outage or service degradation impacting customers
- Disaster recovery event (data center failure, regional outage, natural disaster)
- Post-incident review requiring playbook formalization
- Compliance requirement to document incident response procedures (SOC2, FedRAMP, HIPAA, PCI-DSS)
- New service launch requiring incident runbooks
- On-call rotation setup needing escalation paths

**Not for:**
- Real-time incident coordination (use incident management platforms)
- Automated incident detection (use monitoring/alerting systems)
- Forensic analysis execution (provides methodology only)
- Legal incident disclosure decisions (consult legal counsel)

---

## Pre-Checks

**Time normalization:**
- Compute `NOW_ET` using NIST/time.gov semantics (America/New_York, ISO-8601): 2025-10-25T21:30:36-04:00
- Use `NOW_ET` for all citation access dates

**Input validation:**
- `incident_type` must be: security, outage, disaster-recovery, data-breach, ransomware, ddos, service-degradation
- `severity_level` must be: P0, P1, P2, or P3
- `service_context` (if provided) must include: service_name, team_owner, dependencies
- `compliance_requirements` must be valid framework identifiers
- `tier` must be: T1 or T2

**Source freshness:**
- NIST SP 800-61 Rev 2 (accessed 2025-10-25T21:30:36-04:00): https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final - Computer Security Incident Handling Guide
- Atlassian Incident Management (accessed 2025-10-25T21:30:36-04:00): https://www.atlassian.com/incident-management/incident-response
- PagerDuty Incident Response (accessed 2025-10-25T21:30:36-04:00): https://response.pagerduty.com/
- SANS Incident Handler's Handbook (accessed 2025-10-25T21:30:36-04:00): https://www.sans.org/white-papers/33901/

**Dependency validation:**
- Security incidents leverage: security-assessment-framework (for threat context)
- No hard dependencies for T1 (template generation)

---

## Procedure

### T1: Playbook Template (≤2k tokens)

**Fast path for 80% of standard playbook needs:**

1. **Incident classification:**
   - Map `incident_type` to NIST SP 800-61 category
   - Assign severity level based on `severity_level` input
   - Identify compliance requirements (if any)

2. **Generate playbook structure:**
   - **Phase 1: Preparation** - Pre-incident setup (tools, contacts, access)
   - **Phase 2: Detection & Analysis** - Incident identification and scoping
   - **Phase 3: Containment** - Short-term and long-term containment steps
   - **Phase 4: Eradication** - Root cause removal
   - **Phase 5: Recovery** - Service restoration and validation
   - **Phase 6: Post-Incident Activity** - Lessons learned and documentation

3. **Create escalation matrix:**
   - Define escalation thresholds by severity (P0 ≤15min, P1 ≤30min, P2 ≤2hr, P3 ≤1day)
   - Template contact roles: Incident Commander, Tech Lead, Communications Lead, Executive Sponsor
   - Include paging instructions (PagerDuty, Opsgenie, custom)

4. **Output deliverables:**
   - Playbook markdown document (NIST SP 800-61 aligned)
   - Escalation matrix CSV/JSON
   - Post-mortem template with 5 Whys framework

**Token budget:** T1 ≤2k tokens (template only, no deep context)

---

### T2: Detailed Playbook with Service Context (≤6k tokens)

**Extended path for service-specific, compliance-driven playbooks:**

1. **Enhanced incident analysis (extends T1):**
   - Analyze `service_context` to identify critical dependencies
   - Map service architecture to failure modes (single points of failure, cascading failures)
   - Identify compliance-specific requirements (HIPAA breach notification timelines, PCI-DSS forensic preservation)

2. **Service-specific runbook generation:**
   - Create detailed remediation steps for common failure scenarios
   - Include rollback procedures and health check validation
   - Add monitoring query examples (Prometheus, Datadog, CloudWatch)
   - Document safe restart procedures and dependency startup order

3. **Compliance integration:**
   - NIST SP 800-61: Map playbook phases to incident handling lifecycle
   - SOC2 CC7.3: Document incident response communications
   - HIPAA: Add breach notification timelines (60-day requirement)
   - PCI-DSS 12.10: Include forensic evidence preservation steps
   - FedRAMP: Reference IR-4 and IR-6 controls from NIST SP 800-53

4. **Communication plan generation:**
   - Internal stakeholder notification templates (engineering, support, executives)
   - External communication templates (customer status page, regulatory notifications)
   - Severity-based communication cadence (P0: every 30min, P1: hourly, P2: daily)

5. **Post-mortem template customization:**
   - Include service-specific incident timeline
   - Root cause analysis framework (5 Whys, Fishbone diagram)
   - Action items with owners and due dates
   - Metrics: MTTD (Mean Time to Detect), MTTR (Mean Time to Resolve), customer impact

6. **Decision rules for escalation:**
   - Auto-escalate if incident duration exceeds: P0=30min, P1=2hr, P2=8hr
   - Auto-escalate if customer impact exceeds: P0=any, P1=10%, P2=25%
   - Invoke disaster recovery if: data center failure, regional outage, ransomware with data encryption

**Token budget:** T2 ≤6k tokens (includes service context, compliance, and communication plans)

---

## Decision Rules

**Incident type routing:**
- `security | data-breach | ransomware` → Include forensic preservation steps, consider invoking security-assessment-framework
- `outage | service-degradation` → Focus on MTTR reduction, rollback procedures, health checks
- `disaster-recovery` → Invoke DR site failover procedures, RTO/RPO validation
- `ddos` → Include traffic analysis, rate limiting, upstream provider coordination

**Severity thresholds (auto-escalation triggers):**
- **P0 (critical):** Customer-facing impact, data breach, ransomware → Escalate to VP/C-level within 15 minutes
- **P1 (high):** Partial service degradation, security incident contained → Escalate to Director within 30 minutes
- **P2 (medium):** Internal systems impacted, no customer impact → Escalate to Manager within 2 hours
- **P3 (low):** Minor issues, no service impact → Standard on-call escalation

**Compliance-driven requirements:**
- HIPAA data breach → Invoke 60-day breach notification requirement, add HHS reporting steps
- PCI-DSS incident → Add forensic investigation and PCI QSA notification
- SOC2 incident → Document communications per CC7.3 requirement
- FedRAMP incident → Report to Agency within 1 hour for P0 incidents per IR-6(1)

**Abort conditions:**
- If `incident_type` is unknown/invalid → Request clarification
- If `service_context` missing for T2 → Downgrade to T1 or request architecture details
- If compliance requirements conflict → Flag for manual review and legal consultation

---

## Output Contract

**Required fields (all tiers):**

```yaml
playbook:
  incident_type: string
  severity: "P0" | "P1" | "P2" | "P3"
  nist_phases:
    - phase: "Preparation" | "Detection & Analysis" | "Containment" | "Eradication" | "Recovery" | "Post-Incident"
      steps: array[string]
      duration_estimate: string
      success_criteria: string
  escalation_matrix:
    - role: string
      contact_method: string
      escalation_threshold: string
  post_mortem_template:
    incident_summary: string
    timeline: array[{timestamp, event, actor}]
    root_cause: string
    impact: {customers_affected, duration, revenue_impact}
    action_items: array[{owner, description, due_date, priority}]

runbook: # T2 only
  service_name: string
  failure_modes: array[{scenario, symptoms, remediation_steps}]
  rollback_procedure: array[string]
  health_checks: array[{name, command, expected_result}]
  dependencies: array[{service, startup_order, health_endpoint}]

communication_plan: # T2 only
  internal_stakeholders: array[{role, notification_threshold, channel}]
  external_communication: array[{audience, template, approval_required}]
  status_page_updates: {cadence, template}
```

**Format:** JSON or YAML (consumer specifies)

**Guarantees:**
- All playbooks follow NIST SP 800-61 Rev 2 incident handling lifecycle
- Escalation thresholds are severity-appropriate and time-bounded
- Post-mortem templates include 5 Whys or equivalent root cause analysis
- Compliance requirements mapped to specific playbook steps

---

## Examples

**Input:**
```json
{
  "incident_type": "data-breach",
  "severity_level": "P0",
  "compliance_requirements": ["HIPAA", "SOC2"],
  "tier": "T1"
}
```

**Output (abbreviated):**
```yaml
playbook:
  incident_type: data-breach
  severity: P0
  nist_phases:
    - phase: Containment
      steps:
        - Isolate affected systems from network
        - Preserve forensic evidence (logs, memory dumps)
        - Revoke compromised credentials
      duration_estimate: 30-60 minutes
    - phase: Post-Incident
      steps:
        - HIPAA breach notification to HHS within 60 days
        - SOC2 CC7.3 communication documentation
  escalation_matrix:
    - {role: CISO, contact: PagerDuty, threshold: "15 min"}
    - {role: Legal, contact: Email, threshold: "30 min"}
```

---

## Quality Gates

**Token budgets:**
- **T1 ≤2k tokens:** Template-based playbook generation, no deep service context
- **T2 ≤6k tokens:** Service-specific runbooks with compliance integration
- **T3:** Not implemented (incident response is sufficiently covered by T1/T2 tiers)

**Safety:**
- No embedded credentials or API keys in playbooks
- No PII in example scenarios
- Compliance requirements are technical controls only (not legal advice)

**Auditability:**
- All NIST SP 800-61 citations include access date = NOW_ET
- Compliance mappings traceable to source frameworks
- Escalation thresholds based on industry standards (PagerDuty, Atlassian)

**Determinism:**
- Same inputs → same playbook structure
- Escalation thresholds are severity-based and predictable
- NIST phases always in lifecycle order: Preparation → Detection → Containment → Eradication → Recovery → Post-Incident

**Validation:**
- Playbook must include all 6 NIST SP 800-61 phases
- Escalation matrix must define contact methods and thresholds
- Post-mortem template must include timeline, root cause, and action items

---

## Resources

**Primary sources (NIST SP 800-61 compliance):**
- NIST SP 800-61 Rev 2: Computer Security Incident Handling Guide (accessed 2025-10-25T21:30:36-04:00): https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final
- NIST SP 800-53 Rev 5: IR-4 (Incident Handling), IR-6 (Incident Reporting) (accessed 2025-10-25T21:30:36-04:00): https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final

**Industry best practices:**
- Atlassian Incident Management Handbook (accessed 2025-10-25T21:30:36-04:00): https://www.atlassian.com/incident-management/incident-response
- PagerDuty Incident Response Documentation (accessed 2025-10-25T21:30:36-04:00): https://response.pagerduty.com/
- Google SRE Book: Managing Incidents (accessed 2025-10-25T21:30:36-04:00): https://sre.google/sre-book/managing-incidents/
- SANS Incident Handler's Handbook (accessed 2025-10-25T21:30:36-04:00): https://www.sans.org/white-papers/33901/

**Compliance frameworks:**
- HIPAA Breach Notification Rule (accessed 2025-10-25T21:30:36-04:00): https://www.hhs.gov/hipaa/for-professionals/breach-notification/index.html
- PCI DSS v4.0 Requirement 12.10 (accessed 2025-10-25T21:30:36-04:00): https://www.pcisecuritystandards.org/document_library
- SOC2 Trust Services Criteria CC7.3 (accessed 2025-10-25T21:30:36-04:00): https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report.html

**Templates and tools:**
- See `/skills/resilience-incident-generator/resources/` for:
  - `playbook-template.md` - NIST SP 800-61 aligned playbook structure
  - `escalation-matrix.csv` - Contact escalation template
  - `post-mortem-template.md` - 5 Whys root cause analysis template
  - `runbook-template.md` - Service-specific runbook structure

Overview

This skill generates incident response playbooks and runbooks aligned to NIST SP 800-61 Rev 2, with built-in escalation matrices and compliance mappings. It produces T1 templates for fast, general-purpose playbooks and T2 detailed, service-specific runbooks including communication plans and forensic preservation steps. Outputs include playbook documents, escalation CSV/JSON, and post-mortem templates.

How this skill works

Given validated inputs (incident_type, severity_level, tier, optional service_context and compliance_requirements), the skill classifies the incident, maps it to the NIST incident handling lifecycle, and emits the requested deliverables. T1 returns a compact template with six NIST phases and an escalation matrix. T2 consumes service context to generate detailed remediation steps, rollback procedures, monitoring queries, and compliance-specific actions. All playbooks follow deterministic rules and include time-bounded escalation thresholds.

When to use it

A security incident is detected (malware, ransomware, data breach).
Production outage or service degradation impacting customers.
Disaster recovery events like data center or regional failures.
Formalizing runbooks after a post-incident review.
Meeting compliance documentation requirements (SOC2, HIPAA, PCI-DSS, FedRAMP).

Best practices

Validate inputs: incident_type and severity_level must match allowed values before generation.
Choose T1 for fast templates, T2 when you can supply service_context and compliance constraints.
Always review forensic preservation steps for security incidents and consult legal for disclosure decisions.
Integrate generated escalation matrix with your paging provider (PagerDuty, Opsgenie) and test contact thresholds.
Keep artifacts free of credentials and PII; treat compliance mappings as operational, not legal, guidance.

Example use cases

Generate a P0 data-breach playbook with HIPAA breach-notification steps and a post-mortem template.
Produce a T2 runbook for a web service with dependency startup order, health checks, and rollback steps.
Create an outage playbook with MTTR-focused containment and Prometheus/Datadog query examples.
Produce an escalation matrix CSV to import into an on-call system and a communication cadence for executives and customers.

FAQ

What frameworks does the playbook follow?

Playbooks align to NIST SP 800-61 Rev 2 and map to applicable controls in NIST SP 800-53; compliance mappings reference HIPAA, PCI-DSS, SOC2, and FedRAMP as operational guidance.

When should I use T2 instead of T1?

Use T2 when you can provide service_context (architecture, dependencies) or need compliance-integrated, service-specific remediation and communication plans.