home / skills / petekp / agent-skills / manual-testing

manual-testing skill

/skills/manual-testing

This skill guides you through automated verification of current work, surfacing what Claude can test and prompting you for manual checks when needed.

This is most likely a fork of the manual-testing skill from petekp
npx playbooks add skill petekp/agent-skills --skill manual-testing

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.3 KB
---
name: manual-testing
description: Guide users step-by-step through manually testing whatever is currently being worked on. Use when asked to "test this", "verify it works", "let's test", "manual testing", "QA this", "check if it works", or after implementing a feature that needs verification before proceeding.
---

# Manual Testing

Verify current work through automated testing first, falling back to user verification only when necessary.

## Core Principle

**Automate everything possible.** Only ask the user to manually verify what Claude cannot verify through tools.

## Workflow

### 1. Analyze Current Context

Examine recent work to identify what needs testing:
- Review recent file changes and conversation history
- Identify the feature, fix, or change to verify
- Determine testable behaviors and expected outcomes

### 2. Classify Each Verification Step

For each thing to verify, determine if Claude can test it automatically:

**Claude CAN verify (do these automatically):**
- Code compiles/builds: `npm run build`, `cargo build`, `go build`, etc.
- Tests pass: `npm test`, `pytest`, `cargo test`, etc.
- Linting/type checking: `eslint`, `tsc --noEmit`, `mypy`, etc.
- API responses: `curl`, `httpie`, or scripted requests
- File contents: Read files, grep for expected patterns
- CLI tool output: Run commands and check output
- Server starts: Start server, check for errors, verify endpoints respond
- Database state: Query databases, check records exist
- Log output: Tail logs, grep for expected/unexpected messages
- Process behavior: Check exit codes, stdout/stderr content
- File existence/permissions: `ls`, `stat`, `test -f`
- JSON/config validity: Parse and validate structure
- Port availability: `lsof`, `netstat`, curl localhost
- Git state: Check diffs, commits, branch state

**Claude CANNOT verify (ask user):**
- Visual appearance (colors, layout, spacing, alignment)
- Animations and transitions
- User experience feel (responsiveness, intuition)
- Cross-browser rendering
- Mobile device behavior
- Physical hardware interaction
- Third-party service UIs (OAuth flows, payment forms)
- Accessibility with actual screen readers
- Performance perception (feels fast/slow)

### 3. Execute Automated Verifications

Run all automatable checks first. Be thorough:

```bash
# Example: Testing a web feature
npm run build          # Compiles?
npm run lint           # No lint errors?
npm test               # Tests pass?
npm run dev &          # Server starts?
sleep 3
curl localhost:3000/api/endpoint  # API responds correctly?
```

Report results as you go. If automated tests fail, stop and address before asking user to verify anything.

### 4. User Verification (Only When Necessary)

For steps Claude cannot automate, present them sequentially with selectable outcomes:

```
Step N of M: [Brief description]

**Action:** [Specific instruction - what to do]

**Expected:** [What should happen if working correctly]
```

Then use AskUserQuestion with predicted outcomes:
- 2-4 most likely outcomes as selectable options
- First option: expected/success outcome
- Remaining options: common failure modes
- Free-text "Other" option is provided automatically

**Example:**
```json
{
  "questions": [{
    "question": "How does the button look?",
    "header": "Visual check",
    "options": [
      {"label": "Looks correct", "description": "Blue button, proper spacing, readable text"},
      {"label": "Wrong color/style", "description": "Button exists but styling is off"},
      {"label": "Layout broken", "description": "Elements overlapping or misaligned"},
      {"label": "Not visible", "description": "Button missing or hidden"}
    ],
    "multiSelect": false
  }]
}
```

### 5. Handle Results

**Automated test fails:** Stop and fix before proceeding.

**User reports issue:** Note it, ask if they want to investigate now or continue testing.

### 6. Summarize

After all steps complete:
- List what was verified automatically (with pass/fail)
- List what user verified (with results)
- Summarize any issues found
- Recommend next actions

## Guidelines

- Run automated checks in parallel when possible
- Be creative with verification—most things can be tested programmatically
- If unsure whether something can be automated, try it first
- Keep user verification steps minimal and focused on truly visual/experiential checks

Overview

This skill guides users step-by-step through manually testing whatever is currently being worked on, prioritizing automated verification and falling back to user checks only when necessary. It helps identify what can be tested automatically, runs those checks, and produces minimal, focused manual verification steps for anything Claude cannot verify. The goal is fast, reliable validation and clear next actions when issues appear.

How this skill works

The skill inspects recent changes, build/test outputs, logs, server status, file contents, and conversation context to identify testable behaviors. It runs all automatable checks (build, lint, unit/integration tests, API requests, DB queries, port checks) and reports results. For visual or experiential items it cannot automate, it generates concise sequential user verification steps with expected outcomes and selectable response options. Finally it summarizes automated passes/failures and user-verified results with recommended next steps.

When to use it

  • After implementing a feature that needs verification before merging or deploying
  • When a reviewer says “test this” or “verify it works”
  • Before asking QA or product for acceptance testing
  • After fixing a bug to confirm the fix in context
  • When unsure whether behavior is covered by automated tests

Best practices

  • Automate everything possible; only ask the user for checks Claude cannot perform
  • Run automated checks first and stop to fix failures before manual steps
  • Keep manual steps short, specific, and single-purpose
  • Provide clear expected results and 2–3 common failure options plus an Other field
  • Report pass/fail per verification and recommend concrete next actions

Example use cases

  • New UI component: run build/lint/tests, then ask user to confirm visual styling and interaction
  • API endpoint change: run tests and curl requests, then ask user to exercise the client UI if needed
  • Bug fix: run unit/integration tests and logs, then ask user to reproduce the failing scenario if automated checks pass
  • Deployment smoke test: verify server starts, endpoints respond, and ask user to validate critical UI paths
  • Accessibility check: run automated validators, then provide screen-reader/manual steps for user verification

FAQ

What exactly will you automate?

I run builds, tests, linters/type checks, server start checks, API requests, file/DB/log inspections, and common CLI checks. Anything that can be validated by command output or parsing I will attempt automatically.

When will I be asked to do manual checks?

Only for visual, experiential, or external-service behaviors Claude cannot verify: appearance, animations, mobile/browser-specific rendering, screen-reader behavior, hardware interactions, and third-party UI flows.