home / skills / petekp / agent-skills / formal-verify

formal-verify skill

needs review

This skill continuously verifies architectural constraints and code quality across edits, surfacing violations and guiding fixes before merges.

npx playbooks add skill petekp/agent-skills --skill formal-verify

Review the files below or copy the command above to add this skill to your agents.

Files (22)

SKILL.md

4.9 KB

---
name: formal-verify
description: >
  Continuous formal verification of architectural constraints and code quality.
  Use when asked to verify, audit, or validate codebase integrity. Runs
  automatically via hooks on every edit (structural) and pre-commit (full).
  Catches ownership violations, boundary crossings, state machine bugs,
  and code smells that grep ratchets miss. Triggers: "verify", "formal verify",
  "check architecture", "audit code quality", "run verification",
  "/verify", "/verify --bootstrap", "/verify --grade".
license: MIT
metadata:
  author: petekp
  version: "0.1.0"
---

# formal-verify

Use this skill when architectural intent matters more than "it compiles."

This skill runs a three-layer verification loop:

1. Layer 1: structural verification over extracted AST facts and declarative rules
2. Layer 2: behavioral verification over Z3Py protocol specs and TLA+/Apalache state-machine specs
3. Layer 3: elegance auditing over complexity, consistency, and craft heuristics

The layers are intentionally tiered:

- every edit: Layer 1 only, fast enough for continuous feedback
- slice checkpoint: Layers 1 and 2
- pre-commit and manual `/verify`: all three layers

## Quick Start

Bootstrap a target project with:

```bash
/verify --bootstrap
```

Bootstrap runs four phases:

1. Install dependencies and create `.verifier/`
2. Discover architectural rules from docs and code shape
3. Interview the user in plain English about ambiguities
4. Validate the initial rules against the current codebase

## Commands

- `/verify`
  Runs all layers in verbose mode and prints a unified report.
- `/verify --bootstrap`
  Installs dependencies, creates `.verifier/`, and scaffolds the first rule set.
- `/verify --evolve`
  Checks for drift between architectural docs and existing verification specs.
- `/verify --grade`
  Runs Layer 3 only and reports the current elegance grade.

## How Verification Runs

### Layer 1: Structural

The runner extracts facts from Rust and Swift source files, then checks
`structural.yaml` rules such as:

- only module X may cross boundary Y
- modules matching pattern Z must implement interface W
- all modules must not reference legacy identifiers

Structural checks are the default PostToolUse hook because they are the fastest.

### Layer 2: Behavioral

Behavioral verification covers state transitions and protocol contracts:

- TLA+/Apalache for temporal properties, liveness, and interleavings
- Z3Py spec files for contracts, invariants, and cross-boundary data guarantees

Use this layer at slice checkpoints, before risky merges, and whenever a change
touches coordination logic or cross-language contracts.

### Layer 3: Elegance

Elegance auditing scores code for:

- complexity
- consistency
- craft

It produces a grade and line-level deductions so the agent can clean up code,
not just make it technically correct.

## Violation Handling

When a violation is found, tailor the output to the audience:

- agent output: counterexample, diagnosis, concrete fix suggestion
- human output: counterexample and diagnosis only

If the agent fails to resolve the same violation three times, stop the fix loop
and escalate with:

- the original rule
- the counterexample
- the three attempted fixes
- what still appears to block a correct repair

## Project Structure Created In The Target Repo

Bootstrap creates and maintains:

```text
.verifier/
├── structural.yaml
├── elegance.yaml
├── specs/
├── facts/
└── reports/
```

- `structural.yaml` stores declarative Layer 1 rules
- `elegance.yaml` stores thresholds and grade policy
- `specs/` stores Z3Py and TLA+ behavioral specs
- `facts/` caches extracted AST facts
- `reports/` stores the most recent verification outputs

`facts/` and `reports/` should be gitignored in the target project.

## Operating Guidance

- Run `/verify` before claiming a migration is complete.
- Run `/verify --grade` when the code is correct but still feels rough.
- Prefer updating rules and specs over weakening them when the architecture
  evolves intentionally.
- Keep `SKILL.md` focused on orchestration; pull detailed mechanics from the
  references below.

## References

- `@references/layer1-structural.md`
  Fact extraction, Z3 encoding, reachability, and incremental invalidation.
- `@references/layer2-behavioral.md`
  When to use TLA+/Apalache versus Z3Py, plus spec execution contracts.
- `@references/layer3-elegance.md`
  Metric families, grading, thresholds, and the Layer 3 sub-module layout.
- `@references/constraint-yaml-spec.md`
  Structural rule schema, selectors, assertions, and fact pattern operators.
- `@references/bootstrap-process.md`
  The install, discover, interview, validate bootstrap workflow.
- `@references/agent-feedback-loop.md`
  Hook integration, violation injection, retries, and escalation policy.
- `@references/spec-authoring-guide.md`
  Translating plain-English architectural intent into formal specs.

Overview

This skill performs continuous formal verification of architectural constraints and code quality across a codebase. It provides fast structural checks on every edit, deeper behavioral verification at checkpoints, and elegance audits before commits to catch ownership violations, boundary crossings, state-machine bugs, and subtle code smells. Use it to enforce architectural intent, prevent regressions, and produce actionable, audience-tailored diagnostics.

How this skill works

The runner operates in three layered stages: Layer 1 extracts AST facts and applies declarative structural rules for fast, continuous feedback. Layer 2 executes behavioral specs written in Z3Py and TLA+/Apalache to validate protocols, invariants, and state-machine properties at slice checkpoints. Layer 3 applies elegance heuristics to score complexity, consistency, and craft and produce line-level deductions before pre-commit. Hooks trigger the appropriate layers automatically: every edit (Layer 1), slice checkpoint (Layers 1+2), and pre-commit or manual verify (all layers).

When to use it

On every edit to get immediate structural feedback
Before merging or at slice checkpoints to validate behavior and protocols
Pre-commit or release checks to run full verification and elegance grading
When auditing ownership, boundary rules, or cross-language contracts
When code compiles but architectural intent must be enforced

Best practices

Bootstrap the target project with /verify --bootstrap to create .verifier/ and initial rules
Keep facts/ and reports/ gitignored; evolve rules instead of weakening them when architecture changes
Run /verify --grade to identify stylistic or craft debts separate from correctness
Prefer precise declarative rules and formal specs over ad-hoc grep rules for robust catches
Tailor agent vs human outputs: include fix suggestions for agents, diagnostics and counterexamples for humans

Example use cases

Prevent modules from crossing forbidden boundaries after a refactor
Validate state-machine invariants in concurrent coordination code with TLA+
Catch ownership or cross-language data-violation regressions between Rust and Swift
Automate pre-commit elegance grading to maintain consistent craftsmanship
Bootstrap a repo to discover architectural rules and interview the developer to resolve ambiguities

FAQ

What does /verify --bootstrap do?

It installs verifier dependencies, creates .verifier/, discovers initial rules from code and docs, interviews the user about ambiguities, and validates the rules against the current codebase.

When are each of the three layers run?

Layer 1 runs on every edit for fast structural checks. Layer 2 runs at slice checkpoints or targeted runs. Layer 3 runs on pre-commit or manual full verification. Manual /verify runs all layers.

How are violations reported?

Violations include counterexamples and diagnoses; agent-targeted reports add concrete fix suggestions. If an automated fix fails three times, the skill escalates with the rule, counterexample, attempted fixes, and remaining blockers.