home / skills / athola / claude-night-market / hooks-eval

This skill helps you audit and benchmark Claude Code hooks for security, performance, and compliance using Python SDK and JSON-based hooks.

npx playbooks add skill athola/claude-night-market --skill hooks-eval

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
6.2 KB
---
name: hooks-eval
description: 'Use this skill BEFORE deploying hooks to production. Use when auditing
  existing hooks for security vulnerabilities, benchmarking hook performance, implementing
  hooks using Python SDK, understanding hook callback signatures, validating hooks
  against compliance standards. Do not use when deciding hook placement - use hook-scope-guide
  instead. DO NOT use when: writing hook rules from scratch - use hook-authoring instead.
  DO NOT use when: validating plugin structure - use validate-plugin instead.'
category: hook-management
tags:
- hooks
- evaluation
- security
- performance
- claude-sdk
- agent-sdk
dependencies:
- hook-scope-guide
provides:
  infrastructure:
  - hook-evaluation
  - security-scanning
  - performance-analysis
  patterns:
  - hook-auditing
  - sdk-integration
  - compliance-checking
  sdk_features:
  - python-sdk-hooks
  - hook-callbacks
  - hook-matchers
estimated_tokens: 1200
---
## Table of Contents

- [Overview](#overview)
- [Key Capabilities](#key-capabilities)
- [Core Components](#core-components)
- [Quick Reference](#quick-reference)
- [Hook Event Types](#hook-event-types)
- [Hook Callback Signature](#hook-callback-signature)
- [Return Values](#return-values)
- [Quality Scoring (100 points)](#quality-scoring-(100-points))
- [Detailed Resources](#detailed-resources)
- [Basic Evaluation Workflow](#basic-evaluation-workflow)
- [Integration with Other Tools](#integration-with-other-tools)
- [Related Skills](#related-skills)


# Hooks Evaluation Framework

## Overview

This skill provides a detailed framework for evaluating, auditing, and implementing Claude Code hooks across all scopes (plugin, project, global) and both JSON-based and programmatic (Python SDK) hooks.

### Key Capabilities

- **Security Analysis**: Vulnerability scanning, dangerous pattern detection, injection prevention
- **Performance Analysis**: Execution time benchmarking, resource usage, optimization
- **Compliance Checking**: Structure validation, documentation requirements, best practices
- **SDK Integration**: Python SDK hook types, callbacks, matchers, and patterns

### Core Components

| Component | Purpose |
|-----------|---------|
| **Hook Types Reference** | Complete SDK hook event types and signatures |
| **Evaluation Criteria** | Scoring system and quality gates |
| **Security Patterns** | Common vulnerabilities and mitigations |
| **Performance Benchmarks** | Thresholds and optimization guidance |

## Quick Reference

### Hook Event Types

```python
HookEvent = Literal[
    "PreToolUse",       # Before tool execution
    "PostToolUse",      # After tool execution
    "UserPromptSubmit", # When user submits prompt
    "Stop",             # When stopping execution
    "SubagentStop",     # When a subagent stops
    "TeammateIdle",     # When teammate agent becomes idle (2.1.33+)
    "TaskCompleted",    # When a task finishes execution (2.1.33+)
    "PreCompact"        # Before message compaction
]
```
**Verification:** Run the command with `--help` flag to verify availability.

**Note**: Python SDK does not support `SessionStart`, `SessionEnd`, or `Notification` hooks due to setup limitations. However, plugins can define `SessionStart` hooks via `hooks.json` using shell commands (e.g., leyline's `detect-git-platform.sh`).

### Plugin-Level hooks.json

Plugins can declare hooks via `"hooks": "./hooks/hooks.json"` in plugin.json. The evaluator validates:
- Referenced hooks.json exists and is valid JSON
- Shell commands referenced in hooks exist and are executable
- Hook matchers use valid event types

### Hook Callback Signature

```python
async def my_hook(
    input_data: dict[str, Any],    # Hook-specific input
    tool_use_id: str | None,       # Tool ID (for tool hooks)
    context: HookContext           # Additional context
) -> dict[str, Any]:               # Return decision/messages
    ...
```
**Verification:** Run the command with `--help` flag to verify availability.

### Return Values

```python
return {
    "decision": "block",           # Optional: block the action
    "systemMessage": "...",        # Optional: add to transcript
    "hookSpecificOutput": {...}    # Optional: hook-specific data
}
```
**Verification:** Run the command with `--help` flag to verify availability.

### Quality Scoring (100 points)

| Category | Points | Focus |
|----------|--------|-------|
| Security | 30 | Vulnerabilities, injection, validation |
| Performance | 25 | Execution time, memory, I/O |
| Compliance | 20 | Structure, documentation, error handling |
| Reliability | 15 | Timeouts, idempotency, degradation |
| Maintainability | 10 | Code structure, modularity |

## Detailed Resources

- **SDK Hook Types**: See `modules/sdk-hook-types.md` for complete Python SDK type definitions, patterns, and examples
- **Evaluation Criteria**: See `modules/evaluation-criteria.md` for detailed scoring rubric and quality gates
- **Security Patterns**: See `modules/sdk-hook-types.md` for vulnerability detection and mitigation
- **Performance Guide**: See `modules/evaluation-criteria.md` for benchmarking and optimization

## Basic Evaluation Workflow

```bash
# 1. Run detailed evaluation
/hooks-eval --detailed

# 2. Focus on security issues
/hooks-eval --security-only --format sarif

# 3. Benchmark performance
/hooks-eval --performance-baseline

# 4. Check compliance
/hooks-eval --compliance-report
```
**Verification:** Run the command with `--help` flag to verify availability.

## Integration with Other Tools

```bash
# Complete plugin evaluation pipeline
/hooks-eval --detailed          # Evaluate all hooks
/analyze-hook hooks/specific.py      # Deep-dive on one hook
/validate-plugin .                   # Validate overall structure
```
**Verification:** Run the command with `--help` flag to verify availability.

## Related Skills

- `abstract:hook-scope-guide` - Decide where to place hooks (plugin/project/global)
- `abstract:hook-authoring` - Write hook rules and patterns
- `abstract:validate-plugin` - Validate complete plugin structure
## Troubleshooting

### Common Issues

**Hook not firing**
Verify hook pattern matches the event. Check hook logs for errors

**Syntax errors**
Validate JSON/Python syntax before deployment

**Permission denied**
Check hook file permissions and ownership

Overview

This skill provides a practical evaluation framework to audit, benchmark, and validate Claude Code hooks before they reach production. It focuses on security, performance, compliance, and Python SDK integration to ensure hooks behave safely and efficiently. Use it to catch vulnerabilities, measure runtime characteristics, and confirm callback signatures and return values.

How this skill works

The tool inspects hook definitions (JSON and Python SDK hooks), scans for dangerous patterns, benchmarks execution time and resource use, and scores hooks against a 100-point quality rubric. It validates hook event types, callback signatures, return structures, and plugin-level hooks.json entries, and can emit focused reports (security SARIF, performance baselines, compliance summaries).

When to use it

  • Before deploying hooks to production to catch security and reliability issues
  • When auditing existing hooks for vulnerabilities or injection risks
  • To benchmark hook performance and detect slow or resource-heavy hooks
  • While implementing hooks with the Python SDK to validate signatures and return values
  • To validate hooks against organizational compliance and documentation requirements

Best practices

  • Run the detailed evaluation and then targeted scans (security-only, performance) as separate steps
  • Ensure hooks use supported event types and matchers; verify availability with the --help flag
  • Keep hooks idempotent and short-running; set timeouts and graceful degradation
  • Validate hooks.json exists and shell commands referenced are executable
  • Treat 'block' decisions conservatively and include informative systemMessage text for audit trails

Example use cases

  • Scan a plugin's hooks.json and Python hook files to produce a security report in SARIF
  • Benchmark a set of hooks to create a performance baseline before a high-traffic release
  • Validate a newly written Python SDK hook callback signature and return structure
  • Run a compliance check to ensure hooks include documentation, error handling, and timeouts
  • Integrate with a CI pipeline to fail builds when hooks drop below quality gates

FAQ

Can this skill test all hook event types?

It covers the Python SDK-supported event types and validates JSON-declared hooks; some session or notification hooks may be limited by SDK capabilities.

Does it modify hooks or just report findings?

It performs non-destructive analysis and reporting; remediation suggestions are provided but changes must be applied manually.