home / skills / simhacker / moollm / self-repair

self-repair skill

/skills/self-repair

This skill helps you implement self-healing by detecting missing state, repairing it, logging actions, and continuing without crashes.

npx playbooks add skill simhacker/moollm --skill self-repair

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
2.5 KB
---
name: self-repair
description: "Missing state triggers repair, not failure."
license: MIT
tier: 1
allowed-tools:
  - read_file
  - write_file
related: [robust-first, bootstrap, honest-forget, session-log, coherence-engine, debugging, postel]
tags: [moollm, healing, recovery, resilience, consistency]
---

# Self Repair

> **Missing state triggers repair, not failure.**

Checklist-based self-healing demons.

> [!IMPORTANT]
> **NEVER-CRASH** — The core principle. Repair instead of fail. Always.

## The Principle

When something's wrong:
1. **Detect** — Checklist finds missing/invalid state
2. **Repair** — Demon creates/fixes what's needed
3. **Log** — Document what was repaired
4. **Continue** — Never crash, always converge

## Contents

| File | Purpose |
|------|---------|
| [SKILL.md](./SKILL.md) | Full protocol documentation |
| [CHECKLIST.yml.tmpl](./CHECKLIST.yml.tmpl) | Checklist template |

## Repair Demons

| Demon | Watches For |
|-------|-------------|
| `checklist_repairer` | Missing canonical files |
| `sticky_note_maintainer` | Missing sidecar metadata |
| `membrane_keeper` | Files outside boundaries |

## The Intertwingularity

Self-repair is the immune system. It monitors everything.

```mermaid
graph LR
    SR[🔧 self-repair] -->|monitors| SL[📜 session-log]
    SR -->|monitors| WS[working-set.yml]
    SR -->|creates| HOT[hot.yml / cold.yml]
    SR -->|repairs| FILES[missing files]
    
    SR -->|part of| KERNEL[kernel/self-healing]
```

---

## Dovetails With

### Sister Skills
| Skill | Relationship |
|-------|--------------|
| [session-log/](../session-log/) | Self-repair monitors log integrity |
| [summarize/](../summarize/) | Triggered when context exceeds budget |
| [honest-forget/](../honest-forget/) | Graceful memory decay |

### Protocol Symbols
| Symbol | Link |
|--------|------|
| `NEVER-CRASH` | [PROTOCOLS.yml](../../PROTOCOLS.yml#NEVER-CRASH) |
| `REPAIR-DEMON` | [PROTOCOLS.yml](../../PROTOCOLS.yml#REPAIR-DEMON) |
| `ROBUST-FIRST` | [PROTOCOLS.yml](../../PROTOCOLS.yml#ROBUST-FIRST) |
| `BEST-EFFORT` | [PROTOCOLS.yml](../../PROTOCOLS.yml#BEST-EFFORT) |

### Kernel
- [kernel/self-healing-protocol.md](../../kernel/self-healing-protocol.md) — Full specification
- [schemas/agent-directory-schema.yml](../../schemas/agent-directory-schema.yml) — What gets repaired

### Navigation
| Direction | Destination |
|-----------|-------------|
| ⬆️ Up | [skills/](../) |
| ⬆️⬆️ Root | [Project Root](../../) |
| 📜 Sister | [session-log/](../session-log/) |

Overview

This skill implements checklist-driven self-healing agents that detect and repair missing or invalid state instead of failing. It enforces a NEVER-CRASH principle so the system always converges toward a valid state and continues operating. The goal is robust, observable repairs that are logged and repeatable.

How this skill works

The skill runs lightweight repair demons that continuously inspect canonical files, sidecar metadata, and working-set boundaries. When a checklist finds missing or inconsistent state, a repairer creates or fixes the required artifacts, logs the action, and moves on without terminating the process. Repairs follow predefined templates and kernel-level protocols to ensure safe, idempotent operations.

When to use it

  • When missing configuration or state should be repaired automatically rather than causing failure
  • In long-running agents that must remain available despite transient corruption
  • Where observability of repairs is required for auditing or debugging
  • When multiple components can drift out of sync and need automatic reconciliation
  • To implement best-effort recovery in resource-constrained or distributed environments

Best practices

  • Define concise checklists that assert minimal canonical invariants
  • Make repairs idempotent and conservative to avoid unintended side effects
  • Log every repair with context so actions are auditable and reproducible
  • Limit repair scope to the smallest safe change that restores validity
  • Combine with monitoring and summarize skills to avoid repair loops

Example use cases

  • Auto-create missing canonical files (hot.yml / cold.yml) when a session starts
  • Restore sidecar metadata for content items detected as lacking descriptors
  • Move stray files back into working-set boundaries to preserve integrity
  • Trigger summarize or cleanup workflows when context budgets are exceeded
  • Run periodic sweeps that silently heal common, recoverable state issues

FAQ

What does NEVER-CRASH mean in practice?

It means repairs are preferred over failing: detect the issue, attempt a bounded repair, log the outcome, and continue running rather than raising fatal errors.

How do you avoid repair thrashing or loops?

Use idempotent repairs, conservative change scopes, and include checkpoints or backoff logic; combine with summary and monitoring to detect oscillations.