home / skills / toonight / get-shit-done-for-antigravity / empirical-validation

empirical-validation skill

safe

This skill enforces empirical validation by generating and presenting concrete evidence for changes, ensuring tasks are complete only after verifiable results.

npx playbooks add skill toonight/get-shit-done-for-antigravity --skill empirical-validation

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.5 KB

---
name: Empirical Validation
description: Requires proof before marking work complete — no "trust me, it works"
---

# Empirical Validation

## Core Principle

> **"The code looks correct" is NOT validation.**
> 
> Every change must be verified with empirical evidence before being marked complete.

## Validation Methods by Change Type

| Change Type | Required Validation | Tool |
|-------------|---------------------|------|
| **UI Changes** | Screenshot showing expected visual state | `browser_subagent` |
| **API Endpoints** | Command showing correct response | `run_command` |
| **Build/Config** | Successful build or test output | `run_command` |
| **Data Changes** | Query showing expected data state | `run_command` |
| **File Operations** | File listing or content verification | `run_command` |

## Validation Protocol

### Before Marking Any Task "Done"

1. **Identify Verification Criteria**
   - What should be true after this change?
   - How can that be observed?

2. **Execute Verification**
   - Run the appropriate command or action
   - Capture the output/evidence

3. **Document Evidence**
   - Add to `.gsd/JOURNAL.md` under the task
   - Include actual output, not just "passed"

4. **Confirm Against Criteria**
   - Does evidence match expected outcome?
   - If not, task is NOT complete

## Examples

### API Endpoint Verification
```powershell
# Good: Actual test showing response
curl -X POST http://localhost:3000/api/login -d '{"email":"[email protected]"}' 
# Output: {"success":true,"token":"..."}

# Bad: Just saying "endpoint works"
```

### UI Verification
```
# Good: Take screenshot with browser tool
- Navigate to /dashboard
- Capture screenshot
- Confirm: Header visible? Data loaded? Layout correct?

# Bad: "The component should render correctly"
```

### Build Verification
```powershell
# Good: Show build output
npm run build
# Output: Successfully compiled...

# Bad: "Build should work now"
```

## Forbidden Phrases

Never use these as justification for completion:
- "This should work"
- "The code looks correct"
- "I've made similar changes before"
- "Based on my understanding"
- "It follows the pattern"

## Integration

This skill integrates with:
- `/verify` — Primary workflow using this skill
- `/execute` — Must validate before marking tasks complete
- Rule 4 in `GEMINI.md` — Empirical Validation enforcement

## Failure Handling

If verification fails:

1. **Do NOT mark task complete**
2. **Document** the failure in `.gsd/STATE.md`
3. **Create** fix task if cause is known
4. **Trigger** Context Health Monitor if 3+ failures

Overview

This skill enforces empirical validation: tasks are only marked complete when verifiable evidence is recorded. It removes subjective sign-offs like "code looks correct" and requires concrete outputs such as screenshots, command responses, or build logs. The goal is repeatable, auditable verification attached to each change.

How this skill works

For each change type it specifies the required proof method (UI screenshot, API command output, build output, data query, or file listing). You execute the appropriate action, capture the output, and document the actual evidence in the project journal. If the recorded evidence fails to meet the pre-defined acceptance criteria, the task stays open and failure-handling steps are triggered.

When to use it

Before marking any task or pull request as done
After UI changes that affect layout or visuals
When adding or modifying API endpoints
For build, configuration, or deployment changes
When altering data or file contents

Best practices

Define clear verification criteria before making the change
Capture raw evidence: full command output or unedited screenshots
Record evidence under .gsd/JOURNAL.md tied to the task
Avoid subjective phrases; include concrete success/failure signals
If verification fails, document in .gsd/STATE.md and create a fix task

Example use cases

Take a screenshot of a dashboard after a UI tweak and attach it to the task
Run a curl or PowerShell call against a new API route and paste the response output
Run the build command and include the build log showing successful compilation
Query the database or data store to prove a migration produced the expected rows
List or show file contents after a script changed files to confirm operations

FAQ

What counts as sufficient evidence?

Sufficient evidence is the actual observable output that directly demonstrates the acceptance criteria: full command output, an unedited screenshot showing the expected UI state, or file/data listings that show the exact change.

What if the verification fails?

Do not mark the task complete. Document the failure in .gsd/STATE.md, create a fix task if you know the cause, and trigger the Context Health Monitor after repeated failures.