home / skills / toonight / get-shit-done-for-antigravity / empirical-validation
This skill enforces empirical validation by generating and presenting concrete evidence for changes, ensuring tasks are complete only after verifiable results.
npx playbooks add skill toonight/get-shit-done-for-antigravity --skill empirical-validationReview the files below or copy the command above to add this skill to your agents.
---
name: Empirical Validation
description: Requires proof before marking work complete — no "trust me, it works"
---
# Empirical Validation
## Core Principle
> **"The code looks correct" is NOT validation.**
>
> Every change must be verified with empirical evidence before being marked complete.
## Validation Methods by Change Type
| Change Type | Required Validation | Tool |
|-------------|---------------------|------|
| **UI Changes** | Screenshot showing expected visual state | `browser_subagent` |
| **API Endpoints** | Command showing correct response | `run_command` |
| **Build/Config** | Successful build or test output | `run_command` |
| **Data Changes** | Query showing expected data state | `run_command` |
| **File Operations** | File listing or content verification | `run_command` |
## Validation Protocol
### Before Marking Any Task "Done"
1. **Identify Verification Criteria**
- What should be true after this change?
- How can that be observed?
2. **Execute Verification**
- Run the appropriate command or action
- Capture the output/evidence
3. **Document Evidence**
- Add to `.gsd/JOURNAL.md` under the task
- Include actual output, not just "passed"
4. **Confirm Against Criteria**
- Does evidence match expected outcome?
- If not, task is NOT complete
## Examples
### API Endpoint Verification
```powershell
# Good: Actual test showing response
curl -X POST http://localhost:3000/api/login -d '{"email":"[email protected]"}'
# Output: {"success":true,"token":"..."}
# Bad: Just saying "endpoint works"
```
### UI Verification
```
# Good: Take screenshot with browser tool
- Navigate to /dashboard
- Capture screenshot
- Confirm: Header visible? Data loaded? Layout correct?
# Bad: "The component should render correctly"
```
### Build Verification
```powershell
# Good: Show build output
npm run build
# Output: Successfully compiled...
# Bad: "Build should work now"
```
## Forbidden Phrases
Never use these as justification for completion:
- "This should work"
- "The code looks correct"
- "I've made similar changes before"
- "Based on my understanding"
- "It follows the pattern"
## Integration
This skill integrates with:
- `/verify` — Primary workflow using this skill
- `/execute` — Must validate before marking tasks complete
- Rule 4 in `GEMINI.md` — Empirical Validation enforcement
## Failure Handling
If verification fails:
1. **Do NOT mark task complete**
2. **Document** the failure in `.gsd/STATE.md`
3. **Create** fix task if cause is known
4. **Trigger** Context Health Monitor if 3+ failures
This skill enforces empirical validation: tasks are only marked complete when verifiable evidence is recorded. It removes subjective sign-offs like "code looks correct" and requires concrete outputs such as screenshots, command responses, or build logs. The goal is repeatable, auditable verification attached to each change.
For each change type it specifies the required proof method (UI screenshot, API command output, build output, data query, or file listing). You execute the appropriate action, capture the output, and document the actual evidence in the project journal. If the recorded evidence fails to meet the pre-defined acceptance criteria, the task stays open and failure-handling steps are triggered.
What counts as sufficient evidence?
Sufficient evidence is the actual observable output that directly demonstrates the acceptance criteria: full command output, an unedited screenshot showing the expected UI state, or file/data listings that show the exact change.
What if the verification fails?
Do not mark the task complete. Document the failure in .gsd/STATE.md, create a fix task if you know the cause, and trigger the Context Health Monitor after repeated failures.