home / skills / sounder25 / google-antigravity-skills-library / 20_failure_postmortem

20_failure_postmortem skill

/20_failure_postmortem

This skill records and analyzes failures to identify root causes and preventive rules, ensuring durable postmortems and reduced recurrence.

npx playbooks add skill sounder25/google-antigravity-skills-library --skill 20_failure_postmortem

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
2.0 KB
---
name: Failure Postmortem
description: Structured logging and analysis of execution failures to prevent recurrence.
version: 1.0.0
author: Antigravity Skills Library
created: 2026-01-16
leverage_score: 5/5
---

# SKILL-020: Failure Postmortem

## Overview

Systematically captures failure data. When a tool fails or a plan is abandoned, this skill logs the event to a persistent registry (`POSTMORTEMS.md`), forcing an synchronous analysis of *why* it happened (Root Cause) and *how* to prevent it (Preventive Rule).

## Trigger Phrases

- `log failure`
- `postmortem`
- `why did this fail`
- `record error`

## Inputs

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--Command` | string | Yes | - | The command or action that failed |
| `--Error` | string | Yes | - | The error message or exception trace |
| `--Context` | string | No | `General` | What was the agent trying to do? |

## Outputs

### 1. Log Entry

Appends to `.gemini/antigravity/.forensics/POSTMORTEMS.md`:

```markdown
## [2026-01-16 10:00:00] Failure Analysis
**Context:** Deployment
**Command:** `npm build`
**Error:** `Heap out of memory`

### 🕵️ Analysis
1. **Root Cause:** Node processes default to 512MB RAM.
2. **Preventive Rule:** Always set NODE_OPTIONS="--max-old-space-size=4096" for builds.
```

## Preconditions

1. `.forensics` directory exists (auto-created).
2. PowerShell 5.1+ or Core 7+.

## Safety/QA Checks

1. **Persist**: Ensures the log file is always writable.
2. **No Data Loss**: Appends only, never overwrites previous logs.

## Stop Conditions

| Condition | Action |
|-----------|--------|
| Disk Full | Fail gracefully (stdout only) |

## Implementation

See `scripts/log_failure.ps1`.

## Integration with Other Skills

1. **Agent Catch Block**: On exception -> Call `log_failure.ps1`.
2. **Agent Step**: Read the output path.
3. **Agent Step**: "Thinking" block -> Fill in the "Analysis" section mentally or in next turn.

Overview

This skill captures and persists structured records of execution failures so teams and agents can learn and prevent recurrence. It enforces synchronous analysis: each logged failure includes a root cause and a concrete preventive rule. Logs are appended to a persistent registry to guarantee an auditable trail of incidents and remedies.

How this skill works

When a command or plan fails, the skill accepts the command, error text, and optional context and appends a formatted failure analysis to a persistent POSTMORTEMS.md file under the agent workspace. It auto-creates the forensics directory if missing, verifies file writability, and guarantees append-only behavior to avoid data loss. The entry schema includes timestamp, context, command, error, root cause, and a preventive rule for future runs.

When to use it

  • Automated agents experience an unexpected error or throw an exception
  • A plan or tool run is abandoned and you need a record for retrospection
  • After safety or QA checks identify a reproducible failure mode
  • When building operational playbooks or runbooks from real incidents
  • During adversarial reviews to capture edge-case failures

Best practices

  • Include a clear, minimal command string and the full error trace for reproducibility
  • Capture the agent intent or context to link failures to high-level goals
  • Write the root cause as a specific, testable statement (not vague descriptions)
  • Propose preventive rules that are actionable and enforceable by other skills or CI gates
  • Run periodic reviews of POSTMORTEMS.md and convert recurring entries into automated safety checks

Example use cases

  • Agent catches an out-of-memory build error and logs the command, trace, and a rule to increase Node memory limits
  • A test harness aborts a deployment; postmortem records the configuration mismatch and adds validation before deploy
  • Adversarial review finds a data leak edge case; entry prescribes sanitization and an automated lint check
  • Step-level impasse detection logs manual intervention reasons and suggests automation or guardrails

FAQ

Where is the log stored?

Entries are appended to .gemini/antigravity/.forensics/POSTMORTEMS.md inside the agent workspace.

What happens if disk is full?

The skill fails gracefully and emits the error to stdout without overwriting existing logs.