home / skills / 2389-research / claude-plugins / showdown

showdown skill

/speed-run/skills/showdown

This skill helps you orchestrate parallel plan-based code generation showdowns using hosted LLMs to compare implementations and select a winner.

npx playbooks add skill 2389-research/claude-plugins --skill showdown

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
9.4 KB
---
name: showdown
description: Same design, multiple parallel runners compete using hosted LLM for code generation. Each runner creates own plan, generates code via Cerebras, pick the best. Part of speed-run pipeline.
---

# Showdown

Same design, multiple runners compete. Each runner creates their own implementation plan from the shared design, then generates code via hosted LLM. Natural variation emerges from independent planning decisions.

**Announce:** "I'm using speed-run:showdown for parallel competition via hosted LLM."

**Key insight:** Don't share a pre-made implementation plan. Each runner generates their own plan from the design doc, ensuring genuine variation.

## Directory Structure

```
docs/plans/<feature>/
  design.md                    # Input: from brainstorming
  speed-run/
    showdown/
      runner-1/
        plan.md                # Runner 1's implementation plan
      runner-2/
        plan.md                # Runner 2's implementation plan
      runner-3/
        plan.md                # Runner 3's implementation plan
      result.md                # Showdown results and winner
```

## Skill Dependencies

| Reference | Primary (if installed) | Fallback |
|-----------|------------------------|----------|
| `writing-plans` | `superpowers:writing-plans` | Each runner writes their own plan |
| `executing-plans` | `superpowers:executing-plans` | Execute plan tasks sequentially |
| `parallel-agents` | `superpowers:dispatching-parallel-agents` | Dispatch multiple Task tools in single message |
| `git-worktrees` | `superpowers:using-git-worktrees` | `git worktree add .worktrees/<name> -b <branch>` |
| `tdd` | `superpowers:test-driven-development` | RED-GREEN-REFACTOR cycle |
| `verification` | `superpowers:verification-before-completion` | Run command, read output, THEN claim status |
| `fresh-eyes` | `fresh-eyes-review:skills` (2389) | 2-5 min review for security, logic, edge cases |
| `judge` | `speed-run:judge` | Scoring framework with checklists (MUST invoke at Phase 4) |
| `scenario-testing` | `scenario-testing:skills` (2389) | `.scratch/` E2E scripts, real dependencies |
| `finish-branch` | `superpowers:finishing-a-development-branch` | Verify tests, present options, cleanup |

## Phase 1: Complexity Assessment

**Read design doc and assess:**
- Feature scope (components, integrations, data models)
- Risk areas (auth, payments, migrations, concurrency)
- Estimated implementation size

**Map to runner count:**

| Complexity | Scope | Risk signals | Runners |
|------------|-------|--------------| --------|
| Low | Small feature | None | 2 |
| Medium | Medium feature | Some | 3 |
| High | Large feature | Several | 4 |
| Very high | Major system | Critical areas | 5 |

**Announce:**
```
Complexity assessment: medium feature, touches auth
Spawning 3 parallel runners
Each will create their own implementation plan from the design.
All runners will use hosted LLM (Cerebras) for code generation.
```

## Phase 2: Parallel Execution

**Setup worktrees:**
```
.worktrees/speed-run-runner-1/
.worktrees/speed-run-runner-2/
.worktrees/speed-run-runner-3/

Branches:
<feature>/speed-run/runner-1
<feature>/speed-run/runner-2
<feature>/speed-run/runner-3
```

**CRITICAL: Dispatch ALL runners in a SINGLE message**

Use `parallel-agents` pattern. Send ONE message with multiple Task tool calls:

```
<single message>
  Task(runner-1, run_in_background: true)
  Task(runner-2, run_in_background: true)
  Task(runner-3, run_in_background: true)
</single message>
```

**Runner prompt (each gets same instructions with their runner number):**

```
You are runner N of M in a speed-run showdown.
Other runners are implementing the same design in parallel.
Each runner creates their own implementation plan - your approach may differ from others.

**Your working directory:** /path/to/.worktrees/speed-run-runner-N
**Design doc:** docs/plans/<feature>/design.md
**Your plan location:** docs/plans/<feature>/speed-run/showdown/runner-N/plan.md

**Your workflow:**
1. Read the design doc thoroughly
2. Use writing-plans skill to create YOUR implementation plan
   - Save to: docs/plans/<feature>/speed-run/showdown/runner-N/plan.md
   - Make your own architectural decisions
   - Don't try to guess what other runners will do
3. For each implementation task, use hosted LLM for first-pass code generation:
   - Write a contract prompt (DATA CONTRACT + API CONTRACT + ALGORITHM + RULES)
   - Call: mcp__speed-run__generate_and_write_files
   - Run tests
   - Fix failures with Claude Edit tool (surgical 1-4 line fixes)
   - Re-test until passing
4. Follow TDD for each task
5. Use verification before claiming done

**Code generation rules:**
- Use mcp__speed-run__generate_and_write_files for algorithmic code
- Use mcp__speed-run__generate for text/docs generation
- Use Claude direct ONLY for surgical fixes and multi-file coordination
- Write contract prompts with exact data models, routes, and algorithm steps

**Report when done:**
- Plan created: yes/no
- All tasks completed: yes/no
- Test results (output)
- Files changed count
- Hosted LLM calls made
- Fix cycles needed
- Any issues encountered
```

**Monitor progress:**
```
Showdown status (design: auth-system):
- runner-1: planning... -> generating via Cerebras -> fixing 2/3 -> tests passing
- runner-2: planning... -> generating via Cerebras -> tests passing
- runner-3: planning... -> generating via Cerebras -> fixing 1/2 -> tests passing
```

## Phase 3: Judging

**Step 1: Gate check**
- All tests pass
- Design adherence - implemented what the design specified

**Step 2: Check for identical implementations**

Before fresh-eyes, diff the implementations:
```bash
diff -r .worktrees/speed-run-runner-1/src .worktrees/speed-run-runner-2/src
```

If implementations are >95% identical, note this - the planning step didn't create enough variation. Still proceed but flag in results.

**Step 3: Fresh-eyes on survivors**
```
Starting fresh-eyes review of runner-1 (N files)...
Checking: security, logic errors, edge cases
Fresh-eyes complete: 1 minor issue
```

### Step 4: Invoke Judge Skill

**CRITICAL: Invoke `speed-run:judge` now.**

The judge skill contains the full scoring framework with checklists. Invoking it fresh ensures the scoring format is followed exactly.

```text
Invoke: speed-run:judge

Context to provide:
- Implementations to judge: runner-1, runner-2, runner-3
- Worktree locations: .worktrees/speed-run-runner-N/
- Test results from each runner
- Fresh-eyes findings from Step 3
- Speed-run metrics: hosted LLM calls, fix cycles, generation time per runner
```

The judge skill will:
1. Fill out the complete scoring worksheet for each runner
2. Fill out the Speed-Run Metrics table
3. Build the scorecard with integer scores (1-5, no half points)
4. Check hard gates (Fitness Δ≥2, any score=1)
5. Announce winner with rationale (including token efficiency)

**Do not summarize or abbreviate the scoring.** The judge skill output should be the full worksheet.

**Showdown-specific context:** In showdown, all runners target the same design, so Fitness should be similar. A Fitness gap (Δ≥2) indicates one runner deviated from or misunderstood the design - not a different approach choice.

## Phase 4: Completion

**Verification on winner:**
```
Running final verification on winner (runner-2):
- Tests: 22/22 passing
- Build: exit 0
- Design adherence: all requirements met

Verification complete. Winner confirmed.
```

**Winner:** Use `finish-branch`

**Losers:** Cleanup
```bash
git worktree remove .worktrees/speed-run-runner-1
git worktree remove .worktrees/speed-run-runner-3
git branch -D <feature>/speed-run/runner-1
git branch -D <feature>/speed-run/runner-3
```

**Write result.md:**
```markdown
# Showdown Results: <feature>

## Design
docs/plans/<feature>/design.md

## Runners
| Runner | Plan Approach | Tests | Fresh-Eyes | Lines | LLM Calls | Fix Cycles | Result |
|--------|---------------|-------|------------|-------|-----------|------------|--------|
| runner-1 | Component-first | 24/24 | 1 minor | 680 | 4 | 2 | eliminated |
| runner-2 | Data-layer-first | 22/22 | 0 issues | 720 | 3 | 1 | WINNER |
| runner-3 | TDD-strict | 26/26 | 2 minor | 590 | 5 | 3 | eliminated |

## Plans Generated
- runner-1: docs/plans/<feature>/speed-run/showdown/runner-1/plan.md
- runner-2: docs/plans/<feature>/speed-run/showdown/runner-2/plan.md
- runner-3: docs/plans/<feature>/speed-run/showdown/runner-3/plan.md

## Winner Selection
Reason: Clean fresh-eyes review, solid data-layer-first architecture, fewest fix cycles

## Token Savings
Estimated savings vs Claude direct: ~60% on code generation
```

Save to: `docs/plans/<feature>/speed-run/showdown/result.md`

## Common Mistakes

**Sharing a pre-made implementation plan**
- Problem: All runners copy same code, no variation
- Fix: Each runner uses writing-plans to create THEIR OWN plan from design doc

**Dispatching runners in separate messages**
- Problem: Serial dispatch instead of parallel
- Fix: Send ALL Task tools in a SINGLE message

**Using Claude direct for all code generation**
- Problem: Defeats the purpose of speed-run (token savings)
- Fix: Runners MUST use hosted LLM for first-pass generation

**Skipping fresh-eyes**
- Problem: Judge has no quality signal
- Fix: Fresh-eyes on ALL survivors before comparing

**Not checking for identical implementations**
- Problem: Wasted compute on duplicates
- Fix: Diff implementations before fresh-eyes

**Forgetting cleanup**
- Problem: Orphaned worktrees and branches
- Fix: Always cleanup losers, write result.md

Overview

This skill runs multiple parallel code-generation runners that compete to implement the same design using a hosted LLM (Cerebras). Each runner writes its own implementation plan, generates code, runs tests, and the judge picks the best outcome. It’s built for speed-run pipelines where variation and token efficiency matter.

How this skill works

Given a design document, the skill spawns N isolated runners (worktrees/branches). Each runner reads the design, writes an independent implementation plan, and uses hosted LLM calls for first-pass code generation. After tests and surgical fixes, survivors are diffed, fresh-eyed reviewed, and scored by the judge skill to select a winner and finalize cleanup.

When to use it

  • Accelerating feature implementation with parallel experimentation
  • When you want multiple independent approaches to the same design
  • When token efficiency from a hosted LLM is a priority
  • For medium-to-high complexity features where architecture variation matters
  • When you need an objective scoring and selection workflow

Best practices

  • Never share a pre-made implementation plan; require each runner to author its own plan from the design
  • Dispatch all runners in a single message using the parallel-agents pattern to ensure true parallelism
  • Use hosted LLM for first-pass generation; reserve Claude direct for surgical 1–4 line fixes and coordination
  • Run TDD per task, verify outputs, and record hosted LLM calls and fix cycles for metrics
  • Diff implementations before fresh-eyes to detect near-identical results and flag lack of variation

Example use cases

  • Implementing an auth flow with three different architectural approaches to compare tradeoffs
  • Rapid prototyping of a new endpoint where several data-model strategies are plausible
  • Evaluating token-cost vs quality by running the same tasks across different runner decisions
  • Stress-testing team conventions: observe how independent plans converge or diverge on a shared design

FAQ

How many runners should I spawn?

Map runner count to complexity: 2 for low, 3 for medium, 4 for high, 5 for very high complexity features.

What if implementations are almost identical?

Diff the worktrees; if >95% identical, flag the result, note that planning failed to create variation, but still proceed with judging and lessons for future runs.