home / skills / jezweb / claude-skills / testing-patterns
/skills/testing-patterns
This skill enables declarative, YAML-driven testing with agent-based execution to safely validate MCP servers, APIs, and browser automation.
npx playbooks add skill jezweb/claude-skills --skill testing-patternsReview the files below or copy the command above to add this skill to your agents.
---
name: testing-patterns
description: |
Agent-based declarative testing with YAML test specs. Tests run in sub-agents to preserve
main context while executing many tests. Supports MCP servers, APIs, and browser automation.
Use when: testing MCP servers, running integration tests, validating tool behavior after changes,
or creating regression test suites. Keywords: yaml tests, agent testing, mcp test, integration tests.
license: MIT
---
# Testing Patterns
A pragmatic approach to testing that emphasises:
- **Live testing** over mocks
- **Agent execution** to preserve context
- **YAML specs** as documentation and tests
- **Persistent results** committed to git
## Philosophy
This is **not traditional TDD**. Instead:
1. **Test in production/staging** with good logging
2. **Use agents** to run tests (keeps main context clean)
3. **Define tests declaratively** in YAML (human-readable, version-controlled)
4. **Focus on integration** (real servers, real data)
### Why Agent-Based Testing?
Running 50 tests in the main conversation would consume your entire context window. By delegating to a sub-agent:
- Main context stays clean for development
- Agent can run many tests without context pressure
- Results come back as a summary
- Failed tests get detailed investigation
## Commands
| Command | Purpose |
|---------|---------|
| `/create-tests` | Discover project, generate test specs + testing agent |
| `/run-tests` | Execute tests via agent(s), report results |
| `/coverage` | Generate coverage report and identify uncovered code paths |
**Quick workflow:**
```
/create-tests → Generates tests/specs/*.yaml + .claude/agents/test-runner.md
/run-tests → Spawns agent, runs all tests, saves results
/run-tests api → Run only specs matching "api"
/run-tests --failed → Re-run only failed tests
/coverage → Run tests with coverage, analyse gaps
/coverage --threshold 80 → Fail if below 80%
```
## Getting Started in a New Project
This skill provides the **pattern and format**. Claude designs the actual tests based on your project context.
**What happens when you ask "Create tests for this project":**
1. **Discovery** - Claude examines the project:
- What MCP servers are configured?
- What APIs or tools exist?
- What does the code do?
2. **Test Design** - Claude creates project-specific tests:
- Test cases for the actual tools/endpoints
- Expected values based on real behavior
- Edge cases relevant to this domain
3. **Structure** - Using patterns from this skill:
- YAML specs in `tests/` directory
- Optional testing agent in `.claude/agents/`
- Results saved to `tests/results/`
**Example:**
```
You: "Create tests for this MCP server"
Claude: [Discovers this is a Google Calendar MCP]
[Sees tools: calendar_events, calendar_create, calendar_delete]
[Designs test cases:]
tests/calendar-events.yaml:
- list_upcoming_events (expect: array, count_gte 0)
- search_by_keyword (expect: contains search term)
- invalid_date_range (expect: error status)
tests/calendar-mutations.yaml:
- create_event (expect: success, returns event_id)
- delete_nonexistent (expect: error, contains "not found")
```
**The skill teaches Claude:**
- How to structure YAML test specs
- What validation rules are available
- How to create testing agents
- When to use parallel execution
**Your project provides:**
- What to actually test
- Expected values and behaviors
- Domain-specific edge cases
## YAML Test Spec Format
```yaml
name: Feature Tests
description: What these tests validate
# Optional: defaults applied to all tests
defaults:
tool: my_tool_name
timeout: 5000
tests:
- name: test_case_name
description: Human-readable purpose
tool: tool_name # Override default if needed
params:
action: search
query: "test input"
expect:
contains: "expected substring"
not_contains: "should not appear"
status: success
```
### Validation Rules
| Rule | Description | Example |
|------|-------------|---------|
| `contains` | Response contains string | `contains: "from:john"` |
| `not_contains` | Response doesn't contain | `not_contains: "error"` |
| `matches` | Regex pattern match | `matches: "after:\\d{4}"` |
| `json_path` | Check value at JSON path | `json_path: "$.results[0].name"` |
| `equals` | Exact value match | `equals: "success"` |
| `status` | Check success/error | `status: success` |
| `count_gte` | Array length >= N | `count_gte: 1` |
| `count_eq` | Array length == N | `count_eq: 5` |
| `type` | Value type check | `type: array` |
See `references/validation-rules.md` for complete documentation.
## Creating a Testing Agent
Testing agents inherit MCP tools from the session. Create an agent that:
1. Reads YAML test specs
2. Executes tool calls with params
3. Validates responses against expectations
4. Reports results
### Agent Template
**CRITICAL**: Do NOT specify a `tools` field if you need MCP access. When you specify ANY tools, it becomes an allowlist and `"*"` is interpreted literally (not as a wildcard). Omit `tools` entirely to inherit ALL tools from the parent session.
```yaml
---
name: my-tester
description: |
Tests [domain] functionality. Reads YAML test specs and validates responses.
Use when: testing after changes, running regression tests.
# tools field OMITTED - inherits ALL tools from parent (including MCP)
model: sonnet
---
# [Domain] Tester
## How It Works
1. Find test specs: `tests/*.yaml`
2. Parse and execute each test
3. Validate responses
4. Report pass/fail summary
## Test Spec Location
tests/
├── feature-a.yaml
├── feature-b.yaml
└── results/
└── YYYY-MM-DD-HHMMSS.md
## Execution
For each test:
1. Call tool with params
2. Capture response
3. Apply validation rules
4. Record PASS/FAIL
## Reporting
Save results to `tests/results/YYYY-MM-DD-HHMMSS.md`
```
See `templates/test-agent.md` for complete template.
## Results Format
Test results are saved as markdown for git history:
```markdown
# Test Results: feature-name
**Date**: 2026-02-02 14:30
**Commit**: abc1234
**Summary**: 8/9 passed (89%)
## Results
- test_basic_search - PASSED (0.3s)
- test_with_filter - PASSED (0.4s)
- test_edge_case - FAILED
## Failed Test Details
### test_edge_case
- **Expected**: Contains "expected value"
- **Actual**: Response was empty
- **Params**: `{ action: search, query: "" }`
```
Save to: `tests/results/YYYY-MM-DD-HHMMSS.md`
## Workflow
### 1. Create Test Specs
```yaml
# tests/search.yaml
name: Search Tests
defaults:
tool: my_search_tool
tests:
- name: basic_search
params: { query: "hello" }
expect: { status: success, count_gte: 0 }
- name: filtered_search
params: { query: "hello", filter: "recent" }
expect: { contains: "results" }
```
### 2. Create Testing Agent
Copy `templates/test-agent.md` and customise for your domain.
### 3. Run Tests
```
"Run the search tests"
"Test the API after my changes"
"Run regression tests for gmail-mcp"
```
### 4. Review Results
Results saved to `tests/results/`. Commit them for history:
```bash
git add tests/results/
git commit -m "Test results: 8/9 passed"
```
## Parallel Test Execution
Run multiple test agents simultaneously to speed up large test suites:
```
"Run these test suites in parallel:
- Agent 1: tests/auth/*.yaml
- Agent 2: tests/search/*.yaml
- Agent 3: tests/api/*.yaml"
```
Each agent:
- Has its own context (won't bloat main conversation)
- Can run 10-50 tests independently
- Returns a summary when done
- Inherits MCP tools from parent session
**Why parallel agents?**
- 50 tests in main context = context exhaustion
- 50 tests across 5 agents = clean context + faster execution
- Each agent reports pass/fail summary, not every test detail
**Batching strategy:**
- Group tests by feature area or MCP server
- 10-20 tests per agent is ideal
- Too few = overhead of spawning not worth it
- Too many = agent context fills up
## MCP Testing
For MCP servers, the testing agent inherits configured MCPs:
```bash
# Configure MCP first
claude mcp add --transport http gmail https://gmail.mcp.example.com/mcp
# Then test
"Run tests for gmail MCP"
```
Example MCP test spec:
```yaml
name: Gmail Search Tests
defaults:
tool: gmail_messages
tests:
- name: search_from_person
params: { action: search, searchQuery: "from John" }
expect: { contains: "from:john" }
- name: search_with_date
params: { action: search, searchQuery: "emails from January 2026" }
expect: { matches: "after:2026" }
```
## API Testing
For REST APIs, use Bash tool:
```yaml
name: API Tests
defaults:
timeout: 5000
tests:
- name: health_check
command: curl -s https://api.example.com/health
expect: { contains: "ok" }
- name: get_user
command: curl -s https://api.example.com/users/1
expect:
json_path: "$.name"
type: string
```
## Browser Testing
For browser automation, use Playwright tools:
```yaml
name: UI Tests
tests:
- name: login_page_loads
steps:
- navigate: https://app.example.com/login
- snapshot: true
expect: { contains: "Sign In" }
- name: form_submission
steps:
- navigate: https://app.example.com/form
- type: { ref: "#email", text: "[email protected]" }
- click: { ref: "button[type=submit]" }
expect: { contains: "Success" }
```
## Tips
1. **Start with smoke tests**: Basic connectivity and auth
2. **Test edge cases**: Empty results, errors, special characters
3. **Use descriptive names**: `search_with_date_filter` not `test1`
4. **Group related tests**: One file per feature area
5. **Add after bugs**: Every fixed bug gets a regression test
6. **Commit results**: Create history of test runs
## What This Is NOT
- Not a Jest/Vitest replacement (use those for unit tests)
- Not enforcing TDD (use what works for you)
- Not a test runner library (the agent IS the runner)
- Not about mocking (we test real systems)
## When to Use
| Scenario | Use This | Use Traditional Testing |
|----------|----------|------------------------|
| MCP server validation | Yes | No |
| API integration | Yes | Complement with unit tests |
| Browser workflows | Yes | Complement with component tests |
| Unit testing | No | Yes (Jest/Vitest) |
| Component testing | No | Yes (Testing Library) |
| Type checking | No | Yes (TypeScript) |
## Related Resources
- `templates/test-spec.yaml` - Generic test spec template
- `templates/test-agent.md` - Testing agent template
- `references/validation-rules.md` - Complete validation rule reference
This skill provides an agent-based, declarative testing pattern using YAML test specs to run integration and end-to-end tests without polluting the main conversation context. Tests execute inside sub-agents that inherit MCP tools and browser/CLI access, producing persistent, git-committed results and coverage reports. It favors live testing of real servers, APIs, and browser flows over mocks to catch integration regressions early.
Define human-readable test specs in YAML under tests/*.yaml. A testing agent reads those specs, executes tool calls or browser steps, validates responses with built-in rules (contains, matches, json_path, equals, type, counts), and writes results to tests/results/YYYY-MM-DD-HHMMSS.md. Sub-agents are spawned per suite or batch so the main session stays clean; summaries are returned and failed tests include detailed diagnostics for investigation.
Do I need to list tools in the testing agent config?
No. Omit the tools field so the testing agent inherits all MCP tools from the parent session. Specifying tools creates an allowlist and can block needed access.
Can I run only failed tests or a subset?
Yes. Use the run-tests flags (e.g. --failed) or pass a pattern such as an API name to run matching specs only.
How are results stored and reviewed?
Results are saved as markdown under tests/results with timestamp and commit metadata. Each run includes a summary and failed-test details for quick triage.