home / skills / tkersey / dotfiles / prove-it
This skill evaluates absolute certainty claims by running a gauntlet of tests and refining boundaries to reveal realistic limits.
npx playbooks add skill tkersey/dotfiles --skill prove-itReview the files below or copy the command above to add this skill to your agents.
---
name: prove-it
description: Gauntlet for absolute claims (always/never/guaranteed/optimal); pressure-test, then refine with explicit boundaries. Use when users ask to prove or disprove strong certainty claims, request devil's-advocate challenge rounds, or want the $prove-it gauntlet to run in default autoloop/full-auto style.
---
# Prove It
## When to use
- The user asserts certainty: “always”, “never”, “guaranteed”, “optimal”, “cannot fail”, “no downside”, “100%”.
- The user asks for a devil’s advocate or proof.
- The claim feels too clean for the domain.
## Round cadence (mandatory)
- Definition: one "turn" means one assistant reply.
- Default: autoloop (no approvals). Run exactly one gauntlet round per assistant turn, publish results, then continue on the next turn until Oracle synthesis.
- In default mode, after each round, publish:
- Round Ledger
- Knowledge Delta
- If confidence remains low after Oracle synthesis, continue with additional rounds (11+) and publish an updated Oracle synthesis.
- Do not ask for permission to continue. In default mode, do not wait for "next" between rounds. Pause only when you must ask the user a question or the user says "stop".
- Step mode (explicit): if the user asks to "pause" / "step" / "one round at a time", run one round then wait for "next".
- Full auto mode (explicit): if the user asks for "full auto" / "fast mode", run rounds 1-10 + Oracle synthesis in one assistant turn while still reporting each round in order.
## Mode invocation
| Mode | Default? | How to invoke | Cadence |
|------|----------|---------------|---------|
| Autoloop | yes | (no phrase) | 1 round/turn; auto-continue until Oracle |
| Step mode | no | "step mode" / "pause each round" / "pause" / "step" / "one round at a time" | 1 round/turn; wait for "next" |
| Full auto | no | "full auto" / "fast mode" | rounds 1-10 + Oracle in one turn; publish Round Ledger + Knowledge Delta after each round |
## Quick start
1. Restate the claim and its scope.
2. Default to autoloop. If the user explicitly requests "step mode" or "full auto", use that instead.
3. Run round 1 and publish the Round Ledger + Knowledge Delta.
4. Continue automatically with one round per turn until round 10 (Oracle synthesis).
5. If confidence remains low, run additional rounds (11+) and publish an updated Oracle synthesis.
## Ten-round gauntlet
1. Counterexamples: smallest concrete break.
2. Logic traps: missing quantifiers/premises.
3. Boundary cases: zero/one/max/empty/extreme scale.
4. Adversarial inputs: worst-case distributions/abuse.
5. Alternative paradigms: different model flips the conclusion.
6. Operational constraints: latency/cost/compliance/availability.
7. Probabilistic uncertainty: variance, tail risk, sampling bias.
8. Comparative baselines: “better than what?”, which metric?
9. Meta-test: fastest disproof experiment.
10. Oracle synthesis: tightest surviving claim with boundaries. If confidence is still low, repeat rounds 1-9 as needed, then re-run Oracle synthesis.
## Round self-prompt bank (pick exactly 1)
Internal self-prompts for selecting round focus. Do not ask the user unless blocked.
- Counterexamples: What is the smallest input that breaks this?
- Logic traps: What unstated assumption must hold?
- Boundary cases: Which boundary is most likely in real use?
- Adversarial: What does worst-case input look like?
- Alternative paradigm: What objective makes the opposite true?
- Operational: Which dependency/policy is a hard stop?
- Uncertainty: What distribution shift flips the result?
- Baseline: Better than what, on which metric?
- Meta-test: What experiment would change your mind fastest?
- Oracle: What explicit boundaries keep this honest?
## Core artifacts
### Argument map
```
Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Disproof tests:
- T1:
Refined claim:
```
### Round Ledger (update every round)
```
Round: <1-10 (or 11+)>
Focus:
Claim scope:
New evidence:
New counterexample:
Remaining gaps:
Next round:
```
### Knowledge Delta (publish every round)
```
- New:
- Updated:
- Invalidated:
```
### Claim boundary table
```
| Boundary type | Valid when | Invalid when | Assumptions | Stressors |
|---------------|-----------|--------------|-------------|-----------|
| Scale | | | | |
| Data quality | | | | |
| Environment | | | | |
| Adversary | | | | |
```
### Next-tests plan
```
| Test | Data needed | Success threshold | Stop condition |
|------|-------------|-------------------|----------------|
```
## Domain packs
### Performance
Use when the claim is about speed, latency, throughput, or resources.
- Clarify: median vs tail latency vs throughput.
- Identify workload shape (spiky vs steady) and bottleneck resource.
### Product
Use when the claim is about user impact, adoption, or behavior.
- Clarify user segment and success metric.
- State the baseline/counterfactual.
- Name the likely unintended behavior/tradeoff.
## Oracle synthesis template (round 10 / as needed)
```
Original claim:
Refined claim:
Boundaries:
- Valid when:
- Invalid when:
Confidence trail:
- Evidence:
- Gaps:
Next tests:
- ...
```
## Deliverable format (per turn)
- Round number + focus.
- Round Ledger + Knowledge Delta.
- At most one question for the user (only when blocked).
- In default autoloop, run one round in that turn and continue to the next round in the next turn.
- In step mode, run one round and wait for "next".
- In full auto (or "fast mode"), run rounds 1-10 + Oracle synthesis in one turn (repeat the above per round).
## Activation cues
- "always" / "never" / "guaranteed" / "optimal" / "cannot fail" / "no downside" / "100%"
- "prove it" / "devil's advocate" / "stress test" / "rigor"
This skill is a gauntlet that pressure-tests absolute claims (always, never, guaranteed, optimal) and refines them into honest, bounded statements. It runs iterative rounds that surface counterexamples, hidden assumptions, and operational limits, then synthesizes a tightened claim with explicit boundaries. Use it to avoid overconfidence and produce testable, defensible conclusions.
On each round the assistant picks a focused self-prompt (e.g., counterexample, boundary case, adversarial input) and updates an Argument Map, a Round Ledger, and a Knowledge Delta. Rounds proceed in a defined cadence: autoloop (default) runs one round per assistant turn until Oracle synthesis; step mode pauses between rounds; full auto runs rounds 1–10 plus Oracle in one turn. The final Oracle synthesis produces a refined claim, valid/invalid conditions, confidence trail, and next tests.
How many rounds will you run?
Default autoloop runs one round per assistant turn up to the ten-round gauntlet and then Oracle synthesis; if confidence stays low, additional rounds (11+) run until sufficient confidence.
Can I pause between rounds?
Yes—invoke step mode by asking for "step mode", "pause", or "one round at a time" and the skill will wait for your "next" before proceeding.