home / skills / 0xbigboss / claude-code / e2e

e2e skill

safe

This skill runs e2e tests, fixes flaky/outdated tests, and aligns test behavior with spec, without changing source code.

npx playbooks add skill 0xbigboss/claude-code --skill e2e

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.4 KB

---
name: e2e
description: Run e2e tests, fix flake and outdated tests, identify bugs against spec. Use when running e2e tests, debugging test failures, or fixing flaky tests. Never changes source code logic or API without spec backing.
---

# E2E Testing

## Principles (Always Active)

These apply whenever working with e2e tests, test failures, or test flakiness:

### Failure Taxonomy

Every e2e failure is exactly one of:

**A. Flaky** (test infrastructure issue)
- Race conditions, timing-dependent assertions
- Stale selectors after UI changes
- Missing waits, incorrect wait targets
- Network timing, mock setup ordering
- Symptom: passes on retry, fails intermittently

**B. Outdated** (test no longer matches implementation)
- Test asserts old behavior that was intentionally changed
- Selectors reference removed/renamed elements
- API contract changed, test wasn't updated
- Symptom: consistent failure, app works correctly

**C. Bug** (implementation doesn't match spec)
- Test correctly asserts spec'd behavior, code is wrong
- **Only classify as bug when a spec exists to validate against**
- If no spec exists, classify as "unverified failure" and report to the user

### Fix Rules by Category

**Flaky fixes:**
- Replace `waitForTimeout` with auto-waiting locators
- Replace brittle CSS selectors with `getByRole`/`getByLabel`/`getByTestId`
- Fix race conditions with `expect()` web-first assertions
- Fix mock/route setup ordering (before navigation)
- **Never add arbitrary delays** - fix the underlying wait
- **Never weaken assertions** to make flaky tests pass
- **Never add retry loops around assertions** - use the framework's built-in retry

**Outdated fixes:**
- Update test assertions to match current (correct) behavior
- Update selectors to match current DOM/API
- **Never change source code** - the implementation is correct, the test is stale

**Bug fixes:**
- Quote the spec section that defines expected behavior
- Fix the source code to match the spec
- **Unit tests MUST exist** before the fix is complete
  - If unit tests exist, run them to confirm
  - If unit tests don't exist, write them first (TDD)
- **Never change e2e assertions** to match buggy code
- **Never change API contracts or interfaces** without spec backing
- If no spec exists, ask the user: bug or outdated test?

### Source Code Boundary

E2e test fixes must not change:
- Application logic or business rules
- API contracts, request/response shapes
- Database schemas or migrations
- Configuration defaults

The only exception: bug fixes where a spec explicitly defines the correct behavior and unit tests cover the fix.

## Workflow (When Explicitly Running E2E)

### Step 1: Discover Test Infrastructure

1. Find e2e config: `playwright.config.ts`, `vitest.config.ts`, or project-specific setup
2. Read `package.json` for the canonical e2e command
3. Check if dev server or Tilt environment is required and running
4. Find spec files: `*.spec.md`, `docs/*.spec.md` - source of truth for bug decisions

### Step 2: Run Tests

Run with minimal reporter to avoid context overflow:

```bash
# Playwright
yarn playwright test --reporter=line

# Or project-specific
yarn test:e2e
```

If a filter is specified, apply it:
```bash
yarn playwright test --reporter=line -g "transfer"
yarn test:e2e -- --grep "transfer"
```

Parse failures into:

| Test | File | Error | Category |
|---|---|---|---|
| `login flow` | `auth.spec.ts:42` | timeout waiting for selector | TBD |

### Step 3: Categorize

For each failure:
1. Read the test file
2. Read the source code it exercises
3. Check for a corresponding spec file
4. Assign category: flaky, outdated, bug, or unverified

### Step 4: Fix by Category

Apply fixes following the Principles above, in order:
1. **Flaky** - fix test infrastructure issues first (unblocks other tests)
2. **Outdated** - update stale assertions
3. **Bug** - fix with spec + unit test gate

### Step 5: Re-run and Report

After all fixes, re-run the suite:

```
## E2E Results

**Run**: `yarn test:e2e` on <date>
**Result**: X/Y passed

### Fixed
- FLAKY: `auth.spec.ts:42` - replaced waitForTimeout with getByRole wait
- OUTDATED: `profile.spec.ts:88` - updated selector after header redesign
- BUG: `transfer.spec.ts:120` - fixed amount validation per SPEC.md#transfers

### Remaining Failures
- UNVERIFIED: `settings.spec.ts:55` - no spec, needs user decision

### Unit Tests Added
- `src/transfer.test.ts` - amount validation edge cases (covers BUG fix)
```

Overview

This skill runs end-to-end (e2e) tests, diagnoses failures, and applies targeted fixes for flakes, outdated tests, or implementation bugs. It follows strict rules: never change application logic or API without spec backing, and always gate bug fixes with unit tests. The goal is reliable, spec-correct e2e coverage without masking problems with weak waits or arbitrary delays.

How this skill works

The skill discovers the project e2e configuration and canonical test command, spins up required dev servers, and runs tests with a minimal reporter. When failures occur it parses each failure, reads the test and relevant source code, searches for a corresponding spec, and classifies the failure as flaky, outdated, bug, or unverified. Fixes are applied according to the category rules: stabilize waits and selectors for flakes, update assertions for outdated tests, and fix implementation only when a spec exists and unit tests are present.

When to use it

Running the e2e suite to validate user flows before a release.
Debugging intermittent or consistently failing e2e tests on CI or locally.
Fixing flaky tests that cause CI instability without changing product behavior.
Resolving test failures that may indicate a spec-backed implementation bug.
When updating tests after a UI or API change to keep coverage accurate.

Best practices

Always classify failures using the Failure Taxonomy: flaky, outdated, bug, or unverified.
Fix flakiness by improving waits and selectors (use getByRole/getByLabel/getByTestId), not by adding timeouts or weakening assertions.
Never change application logic or API contracts without quoting a spec and adding unit tests first.
For bug fixes, write or run unit tests before modifying source; unit tests must pass locally.
Run the suite with a minimal reporter and focus fixes in order: flaky → outdated → bug, then re-run and report.

Example use cases

Replace a fragile waitForTimeout and CSS selector with auto-waiting locators and getByRole to fix an intermittent login test.
Update assertions and selectors in a profile test after a header redesign when the app behavior is correct.
Identify a failing transfer validation as a bug by locating SPEC.md, quote the spec, add unit tests for edge cases, and fix the validation logic.
Report an unverified failure when no spec exists and ask product/QA whether the test or implementation should change.
Run targeted e2e runs (grep) to iterate quickly on a single failing flow instead of the whole suite.

FAQ

What if a failing test could be either a bug or an outdated test?

Look for a spec or authoritative documentation. If a spec exists, treat it as a bug and follow bug fix rules. If no spec exists, mark it unverified and ask product/QA before changing code.

Is adding retries around assertions allowed to stabilize flakes?

No. Use the test framework's built-in retry and fix the underlying wait/selectors rather than adding retry loops or arbitrary delays.