home / skills / shipshitdev / library / spec-first

This skill guides non-trivial builds by enforcing a spec→plan→execute→verify loop, producing spec.md, todo.md, and decisions.md before coding.

npx playbooks add skill shipshitdev/library --skill spec-first

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
6.8 KB
---
name: spec-first
description: Use when building anything non-trivial. Enforces a spec → plan → execute → verify loop that prevents "looks right" failures. Creates spec.md, todo.md, and decisions.md before writing code.
---

# Spec-First Development

A structured workflow for LLM-assisted coding that delays implementation until decisions are explicit.

## When This Activates

- "Build X" or "Create Y" (new features/projects)
- "Implement..." (non-trivial functionality)
- "Add a feature that..." (multi-step work)
- Any request requiring 3+ files or unclear requirements

## When to Skip

- Single-file changes under 50 lines
- Typo fixes, log additions, config tweaks
- User explicitly says "just do it" or "quick fix"

## Core Principles

1. **Delay implementation until tradeoffs are explicit** — Use conversation to clarify constraints, compare options, surface risks. Only then write code.

2. **Treat the model like a junior engineer with infinite typing speed** — Provide structure: clear interfaces, small tasks, explicit acceptance criteria. Code is cheap; understanding and correctness are scarce.

3. **Specs beat prompts** — For anything non-trivial, create a durable artifact (spec file) that can be re-fed, diffed, and reused across sessions.

4. **Generated code is disposable; tests are not** — Assume rewrites. Design for easy replacement: small modules, minimal coupling, clean seams, strong tests.

5. **The model is over-confident; reality is the judge** — Everything important gets verified by execution: tests, linters, typecheckers, reproducible builds.

## The 6-Stage Workflow

### Stage A: Frame the Problem (conversation mode)

**Goal:** Decide before you implement.

Prompts that work:

- "List 3 viable approaches. Compare on: complexity, failure modes, testability, future change, time to first demo."
- "What assumptions are you making? Which ones are risky?"
- "Propose a minimal version that can be deleted later without regret."

**Output:** Decision notes for `.agents/DECISIONS/[feature-name].md`

### Stage B: Write spec.md (freeze decisions)

**Goal:** Turn decisions into unambiguous requirements.

**File:** `.agents/SPECS/[feature-name].md`

```markdown
# [Feature Name] Spec

## Purpose
One paragraph: what this is for.

## Non-Goals
Explicitly state what you are NOT building.

## Interfaces
Inputs/outputs, data types, file formats, API endpoints, CLI commands.

## Key Decisions
Libraries, architecture, persistence choices, constraints.

## Edge Cases and Failure Modes
Timeouts, retries, partial failures, invalid input, concurrency, idempotency.

## Acceptance Criteria
Bullet list of testable statements. Avoid "should be fast."
Prefer: "processes 1k items under 2s on M1 Mac."

## Test Plan
Unit/integration boundaries, fixtures, golden files, what must be mocked.
```

### Stage C: Generate todo.md (planning mode)

**Goal:** Stepwise checklist where each step has a verification command.

**File:** `.agents/TODOS/[feature-name].md`

```markdown
# [Feature Name] TODO

- [ ] Add project scaffolding (build/run/test commands)
  Verify: `npm run build && npm test`

- [ ] Implement module X with interface Y
  Verify: `npm test -- --grep "module X"`

- [ ] Add tests for edge cases A/B/C
  Verify: `npm test -- --grep "edge cases"`

- [ ] Wire integration
  Verify: `npm run integration`

- [ ] Add docs
  Verify: `npm run docs && open docs/index.html`
```

Each item must be independently checkable. This prevents "looks right" progress.

### Stage D: Execute Changes (implementation mode)

**Goal:** Small diffs, frequent verification, controlled context.

Rules:

- One logical change per step
- Keep focus on one interface at a time
- After each change: run verification command, paste actual output back
- Commit early and often

For large codebases:

- Provide only relevant files plus spec/todo
- If summarizing repo, do it once and keep as reusable artifact

### Stage E: Verify and Review (adversarial mode)

**Goal:** Force the model to try to break its own work.

Prompts:

- "Act as a hostile reviewer. Find correctness bugs, not style nits. List concrete failing scenarios."
- "Given these acceptance criteria, which are not actually satisfied? Be specific."
- "Propose 5 tests that would fail if the implementation is wrong."

### Stage F: Decide What Lasts

**Goal:** Keep the system easy to delete and rewrite.

Heuristics:

- Keep "policy" (business rules) separate from "mechanism" (I/O, DB, HTTP)
- Prefer shallow abstractions that can be removed without cascade
- Invest in tests and fixtures more than clever architecture

## The Three-File Convention

Keep in the `.agents/` folder (not project root):

```
.agents/
├── SPECS/
│   └── [feature-name].md    # what/why/constraints
├── TODOS/
│   └── [feature-name].md    # steps + verification commands
└── DECISIONS/
    └── [feature-name].md    # tradeoffs, rejected options, assumptions
```

**Naming:** Use the feature/task name as the filename (e.g., `user-auth.md`, `api-refactor.md`).

**Why .agent folder:**

- Keeps project root clean
- Groups all AI-assisted planning artifacts
- Works with task-prd-creator and ai-dev-loop skills
- Persists across sessions

## Agent Readiness Checklist (IMPACT)

Before running autonomous/agentic execution, verify:

| Dimension | Question | If No... |
|-----------|----------|----------|
| **Intent** | Do you have acceptance criteria and a test harness? | Don't run agent |
| **Memory** | Do you have durable artifacts (spec/todo) so it can resume? | It will thrash |
| **Planning** | Can it produce/update a plan with checkpoints? | It will improvise badly |
| **Authority** | Is what it can do restricted (edit, test, commit)? | Too risky |
| **Control Flow** | Does it decide next step based on tool output? | It's just generating blobs |
| **Tools** | Does it have minimum necessary tooling and nothing extra? | Attack surface too large |

Approve at meaningful checkpoints (end of todo item, after test suite passes), not every micro-step.

## Prompt Patterns

**Authoritarian (for correctness):**

```
Edit these files: [paths]
Interface: [exact signatures]
Acceptance criteria: [list]
Required tests: [list]
Don't change anything else.
```

**Options and tradeoffs (for design):**

```
Give me 3 options and a recommendation.
Make the recommendation conditional on constraints A/B/C.
```

**Context discipline (for large codebases):**

```
Only use the files I provided.
If you need more context, ask for a specific file and explain why.
```

**Make it provable:**

```
Add a test that fails on the buggy version and passes on the correct one.
```

## Output Format

When this skill activates, produce:

```
SPEC-FIRST WORKFLOW

STAGE A - FRAMING:
[3 approaches with tradeoffs]
[Recommendation]

STAGE B - SPEC:
[Draft spec.md content]

STAGE C - TODO:
[Draft todo.md with verification commands]

Ready to proceed to Stage D (execution)?
```

Overview

This skill enforces a spec → plan → execute → verify loop for any non-trivial development task. It creates three durable artifacts (spec.md, todo.md, decisions.md) in a .agents folder and prevents premature implementation that only "looks right." Use it to turn vague requests into testable, reversible work.

How this skill works

Before writing code, the skill runs a six-stage workflow: frame the problem, freeze decisions in a spec, generate a stepwise todo with verification commands, execute small changes with frequent checks, run adversarial verification, and decide what code and artifacts should persist. It produces SPEC, TODO, and DECISIONS files and requires each todo item to include an explicit verification command to avoid “looks right” failures.

When to use it

  • Building a new feature or project that touches multiple files or components
  • Implementing non-trivial functionality or multi-step workflows
  • Requests that imply 3+ files, unclear requirements, or important tradeoffs
  • Preparing for agentic/autonomous runs that must be resumable and auditable
  • Refactors where behavior must be preserved and verified

Best practices

  • Delay implementation until tradeoffs and risky assumptions are explicit
  • Write a durable spec that lists purpose, non-goals, interfaces, and acceptance criteria
  • Create todos where each item has an independent verification command
  • Favor small, single-purpose commits and run verification after every step
  • Invest in tests and fixtures; treat generated code as disposable
  • Use adversarial review prompts to surface concrete failing scenarios

Example use cases

  • Add user authentication with acceptance criteria and test plan before any code
  • Refactor an API surface while preserving behavior via golden-file tests
  • Implement a background worker that must handle retries and idempotency
  • Prepare an autonomous agent to run a multi-step migration with checkpoints
  • Introduce a feature requiring decisions about libraries, persistence, and interfaces

FAQ

Can I skip the spec-first process for small tasks?

Yes. Skip it for trivial single-file edits under ~50 lines, typo fixes, or when the user explicitly requests a quick change.

Where should the spec/todo/decisions files live?

Store them under a .agents folder (e.g., .agents/SPECS/feature.md) so the project root stays clean and artifacts persist across sessions.