home / skills / gigaverse-app / skillet / pythonista-testing

pythonista-testing skill

/pythonista/skills/pythonista-testing

This skill guides test-driven debugging in Python by enforcing best practices, fixtures, mocks, and invariant tests to improve code quality.

npx playbooks add skill gigaverse-app/skillet --skill pythonista-testing

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
5.1 KB
---
name: pythonista-testing
description: Use when writing or modifying tests, fixing bugs with TDD, reviewing test code. Triggers on "test", "tests", "testing", "TDD", "test-driven", "pytest", "add tests", "write tests", "unit test", "integration test", "test coverage", "bug fix", "fix bug", "mock", "fixture", "assert", "conftest", "parametrize", "async test", or when editing files in tests/ directory.
---

# Python Testing Best Practices

## Core Philosophy

**Write invariant-based tests that verify what SHOULD be true, not bug-affirming tests that prove bugs existed.**

## Test-Driven Development (TDD)

**ALWAYS use TDD when fixing bugs:**

1. Find existing tests for the broken functionality
2. Run them to verify they pass (shouldn't catch bug)
3. Improve tests until they fail (exposing the bug)
4. Fix the code to make tests pass
5. Verify all tests pass

## Quick Start

```bash
# Look for existing fixtures in conftest.py
# Use Claude's Grep tool: pattern="@pytest.fixture" path="tests/conftest.py"

# Look at sibling test files for patterns
ls tests/test_<module_name>/

# Run tests with coverage
pytest --cov=src --cov-report=term-missing
```

## Critical Rules

### Mocking - ALWAYS use patch.object

```python
# CORRECT - refactor-safe
from unittest.mock import patch

@patch.object(MyClass, 'method_name')
def test_with_mock(mock_method):
    ...

# WRONG - breaks silently on refactor
@patch('module.path.MyClass.method_name')
def test_with_mock(mock_method):
    ...
```

### Mock Dependencies, NOT the System Under Test

```python
# CORRECT - Mock dependencies, test real code
generator = NewsPostGenerator()
generator._queries_chain = AsyncMock()      # Dependency - mock it
generator._search_engine = AsyncMock()       # Dependency - mock it
await generator.generate_news_post(...)      # SUT - actually runs

# WRONG - Tests nothing
generator = AsyncMock(spec=NewsPostGenerator)
```

### Test Data - ALWAYS use Pydantic models

```python
# CORRECT - validation, type safety
def create_test_result(channel_id: str) -> VideoModerationResult:
    return VideoModerationResult(
        channel_id=channel_id,
        user_id="test_user",
        timestamp=datetime.now(UTC),
        details=VideoModerationDetails(is_appropriate=True)
    )

# WRONG - no validation, won't catch schema changes
def create_test_data():
    return {"channel_id": "test", "user_id": "user123"}
```

### Constants - NEVER use naked literals

```python
# CORRECT - relationships explicit
DEFAULT_RECHECK_INTERVAL = 60
STALE_AGE = DEFAULT_RECHECK_INTERVAL + MODERATION_DURATION + 10

# WRONG - magic numbers
timestamp = datetime.now(UTC) - timedelta(seconds=120)  # Why 120?
```

### Invariant Testing

```python
# CORRECT - Test what SHOULD be true
def test_selector_populated_with_all_names():
    """INVARIANT: Selector contains all names from config."""
    config = make_config_with_items(["item1", "item2", "item3"])
    page = setup_page_with_config(config)
    assert page.item_selector.options == ["item1", "item2", "item3"]

# WRONG - Bug-affirming test
def test_bug_123_selector_empty():
    assert len(selector.options) > 0  # Proves bug, doesn't verify correctness
```

### E2E Testing - Call Production Code

```python
# CORRECT - Call actual production code
async def test_flow_e2e():
    await service.process_request(request_input)
    published_event = mock_queue.publish.call_args.kwargs["output"]
    assert published_event.data is not None  # Fails if code forgot data

# WRONG - Manually construct state (WE added this, not production code!)
```

### Access Mock Args Explicitly

```python
# CORRECT - Clear and explicit
flow_input = call_args.args[0]
delay = call_args.kwargs["delay"]

# WRONG - Cryptic
flow_input = call_args[0][0]
```

## Testing Checklist

Before committing:
- [ ] All imports at top of file
- [ ] Using `patch.object`, not `patch`
- [ ] Mocking dependencies, not SUT
- [ ] No mocking of model classes
- [ ] Test data uses Pydantic models
- [ ] Checked conftest.py for existing fixtures
- [ ] No naked literals - all values are constants
- [ ] Mock args accessed with `.args[N]` / `.kwargs["name"]`
- [ ] E2E tests call actual production code
- [ ] Tests verify invariants, not bug existence
- [ ] 100% coverage of new code

## Reference Files

For detailed patterns and examples:
- [references/mocking.md](references/mocking.md) - Mocking strategies and when to mock
- [references/test-data.md](references/test-data.md) - Test data creation patterns
- [references/e2e-testing.md](references/e2e-testing.md) - E2E and user journey patterns
- [references/intent-and-testability.md](references/intent-and-testability.md) - Pure functions, testability
- [references/concurrency.md](references/concurrency.md) - Async and concurrency testing
- [references/fixtures.md](references/fixtures.md) - Pytest fixtures and decorators

## Related Skills

- [/pythonista-debugging](../pythonista-debugging/SKILL.md) - Root cause analysis
- [/pythonista-typing](../pythonista-typing/SKILL.md) - Type safety in tests
- [/pythonista-async](../pythonista-async/SKILL.md) - Async testing patterns
- [/pythonista-reviewing](../pythonista-reviewing/SKILL.md) - Test code review

Overview

This skill helps developers write, modify, and review Python tests with a focus on TDD and robust test design. It codifies practical patterns: invariant-based assertions, safe mocking, use of Pydantic for test data, and strict checklist items to catch regressions. Use it to improve test coverage, fix bugs with tests-first workflow, and validate async and E2E flows reliably.

How this skill works

The skill inspects test files and changes in a tests/ directory and responds when test-related keywords appear. It guides you to run existing tests, add failing tests to reproduce bugs, and then implement minimal fixes until all tests pass. It enforces patterns: patch.object for mocks, mocking dependencies (not the system under test), Pydantic models for test data, constants instead of naked literals, and explicit access to mock call args.

When to use it

  • Writing new unit or integration tests
  • Fixing bugs using test-driven development (TDD)
  • Reviewing or refactoring existing test code
  • Adding or updating fixtures and mocking strategies
  • Improving test coverage for changed code

Best practices

  • Always use TDD when fixing bugs: write failing test first, then fix code
  • Use patch.object rather than string-based patch to remain refactor-safe
  • Mock dependencies, never replace the system under test with a mock
  • Create test data using Pydantic models for validation and type safety
  • Replace naked literals with named constants to document intent
  • Access mock calls explicitly via .args[N] and .kwargs["name"] for clarity

Example use cases

  • You find a failing feature: add a targeted invariant test that reproduces the bug, then implement the fix until tests pass
  • Refactor a module: run tests, update mocks to patch.object, convert raw dict test data to Pydantic models
  • Add E2E coverage: call actual production code and assert outputs on mocked queues or services
  • Review a pull request: verify tests follow the checklist and do not mock the SUT or use magic numbers
  • Improve async behavior tests: use AsyncMock for dependencies and assert awaited interactions

FAQ

What if tests reference external services?

Mock external dependencies only; call production code. Use AsyncMock or dedicated test doubles for network/IO and assert interactions rather than replacing the SUT.

How strict is the 100% coverage rule?

Aim for full coverage of new or changed code paths. Pragmatically prioritize meaningful invariant tests and critical logic branches over noisy tests that only inflate metrics.