home / skills / amnadtaowsoam / cerebraskills / triage-workflow

triage-workflow skill

/41-incident-management/triage-workflow

This skill provides rapid triage guidance and decision support by referencing incident triage workflows and tools for efficient escalation.

npx playbooks add skill amnadtaowsoam/cerebraskills --skill triage-workflow

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
757 B
---
name: Triage Workflow
description: See the main Incident Triage skill for comprehensive triage workflows and procedures.
---

# Triage Workflow

This skill is covered in detail in the main **Incident Triage** skill.

Please refer to: `41-incident-management/incident-triage/SKILL.md`

That skill covers:
- Rapid assessment procedures
- Triage objectives and decision trees
- Information gathering techniques
- Quick diagnosis methods
- Escalation vs. resolution decisions
- Triage tools (PagerDuty, Opsgenie, Incident.io)
- Real triage scenarios with walkthroughs

---

## Related Skills
- `41-incident-management/incident-triage` (Main skill with triage workflows)
- `41-incident-management/severity-levels`
- `41-incident-management/escalation-paths`

Overview

This skill provides a focused triage workflow to rapidly assess and act on incidents using established incident management practices. It summarizes objectives, decision paths, and essential tools to make fast, reliable triage decisions. It acts as a compact companion to the main Incident Triage skill for quick reference during live incidents.

How this skill works

The workflow guides responders through rapid assessment, information collection, and a decision tree that leads to escalation or immediate mitigation. It highlights quick diagnosis techniques and recommends when to engage on-call tools (PagerDuty, Opsgenie, Incident.io). The process emphasizes time-boxed checks, relevant telemetry queries, and clear communication steps to reduce mean time to acknowledge and resolve.

When to use it

  • At first alert to determine incident severity and scope within minutes
  • When deciding whether to escalate to on-call engineers or resolve at first responder level
  • During active incidents to keep triage consistent across multiple responders
  • When training new responders on practical triage steps and priorities
  • For post-incident reviews to validate decisions made during triage

Best practices

  • Time-box initial assessment (e.g., 5–10 minutes) to avoid analysis paralysis
  • Collect targeted telemetry and recent deploy/change data before general exploration
  • Use a simple decision tree: validate alert -> assess impact -> escalate or mitigate
  • Clearly record actions and communications in the incident channel for handoffs
  • Prefer safe, reversible mitigations during initial triage to preserve data

Example use cases

  • Service latency spike: follow quick diagnosis checklist, check recent deployments, and decide rollback vs throttling
  • Partial outage: identify affected subsystems and escalate to domain owners while implementing temporary routing
  • False positive alert: validate metrics and silence noisy alerts to reduce alert fatigue
  • Security alert: rapidly gather indicators of compromise and escalate to the security response team

FAQ

How long should initial triage take?

Aim for a 5–10 minute initial assessment to determine impact and next steps; extend only if new critical information appears.

When should I escalate instead of resolving immediately?

Escalate when the incident affects critical user-facing services, requires privileged access, or when the fix is outside your domain expertise.