home / skills / shotaiuchi / dotclaude / review-error-handling

review-error-handling skill

safe

This skill helps you review error handling and resilience in code, ensuring robust exception management, retry policies, and graceful degradation.

npx playbooks add skill shotaiuchi/dotclaude --skill review-error-handling

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.8 KB

---
name: review-error-handling
description: >-
  Error handling and resilience-focused code review. Apply when reviewing
  exception handling, error propagation, retry logic, fallback strategies,
  graceful degradation, and failure recovery paths.
user-invocable: false
---

# Error Handling Review

Review code from an error handling and resilience perspective.

## Review Checklist

### Exception Handling
- Verify all throwable operations are properly caught
- Check catch blocks are specific (not bare catch-all)
- Ensure exceptions are not silently swallowed
- Verify error context is preserved when re-throwing

### Error Propagation
- Check errors propagate to appropriate handling layers
- Verify error types are meaningful (not generic strings)
- Ensure callers handle all possible error states
- Check Result/Either patterns are used consistently

### Retry & Recovery
- Verify retry logic has proper backoff strategy
- Check maximum retry limits are configured
- Ensure idempotency for retried operations
- Verify circuit breaker patterns where appropriate

### Graceful Degradation
- Check fallback behavior when dependencies fail
- Verify partial failure handling (some items succeed, some fail)
- Ensure timeouts are configured for all external calls
- Check user-facing error messages are helpful and safe

### Resource Cleanup
- Verify resources are released in error paths (finally/defer/use)
- Check database transactions are rolled back on failure
- Ensure temporary files are cleaned up on error
- Verify connection pools handle failed connections

## Output Format

| Severity | Description |
|----------|-------------|
| Critical | Unhandled error causes crash or data loss |
| High | Error silently swallowed, masking real problems |
| Medium | Error handled but with poor user experience |
| Low | Error handling works but could be more robust |

Overview

This skill performs error handling and resilience-focused code reviews to surface gaps in exception management, recovery strategies, and resource cleanup. It highlights critical risks like unhandled exceptions, silent failures, and missing timeouts, and recommends concrete fixes. Use it to improve system robustness, observability, and user-facing behavior under failure conditions.

How this skill works

The review inspects exception handling, error propagation, retry and recovery patterns, graceful degradation, and resource cleanup paths. It checks for specific catch blocks, preserved error context, meaningful error types, backoff and retry limits, idempotency, fallbacks, timeouts, and cleanup in failure flows. Results are categorized by severity and include actionable findings and remediation suggestions.

When to use it

When validating exception handling before a release or after an incident
When adding retries, circuit breakers, or fallback logic to external calls
When auditing error propagation across service boundaries or libraries
When implementing transactional or cleanup logic that must run on failure
When improving user-facing error messages and degraded-mode behavior

Best practices

Catch specific exceptions and avoid bare catch-all blocks unless justified
Preserve and enrich error context when rethrowing to aid debugging
Use structured error types or Result/Either patterns rather than generic strings
Implement idempotency for retried operations and configure exponential backoff and max retries
Set timeouts on external calls and provide safe, documented fallbacks
Ensure finally/defer/cleanup paths always release resources and roll back transactions

Example use cases

Review a service change that adds external API retries to ensure idempotency and backoff
Audit database transaction handling to confirm rollbacks and resource cleanup on failures
Validate that user-facing errors do not leak sensitive details and provide useful next steps
Assess a new circuit breaker implementation and its integration with retry logic
Check batch processing code for partial failure handling and retry/fallback strategies

FAQ

What constitutes a Critical severity?

Unhandled errors that can crash processes or cause data loss are Critical and require immediate fixes.

How do you judge swallowed errors?

A swallowed error is detected when exceptions are caught without logging, rethrowing, or meaningful handling that preserves context.