home / skills / wesleysmits / agent-skills / error-handling-patterns

error-handling-patterns skill

/.agent/skills/error-handling-patterns

This skill helps you implement robust error handling across languages, enabling resilient APIs, clear messages, and graceful failure strategies.

This is most likely a fork of the error-handling-patterns skill from xfstudio
npx playbooks add skill wesleysmits/agent-skills --skill error-handling-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
8.7 KB
---
name: implementing-error-handling
description: Master error handling patterns across languages including exceptions, Result types, error propagation, and graceful degradation to build resilient applications. Use when implementing error handling, designing APIs, or improving application reliability.
---

# Error Handling Patterns

Build resilient applications with robust error handling strategies that gracefully handle failures and provide excellent debugging experiences.

## When to Use This Skill

- Implementing error handling in new features
- Designing error-resilient APIs
- Debugging production issues
- Improving application reliability
- Creating better error messages for users and developers
- Implementing retry and circuit breaker patterns
- Handling async/concurrent errors
- Building fault-tolerant distributed systems

## Core Concepts

### 1. Error Handling Philosophies

**Exceptions vs Result Types:**

- **Exceptions**: Traditional try-catch, disrupts control flow
- **Result Types**: Explicit success/failure, functional approach
- **Error Codes**: C-style, requires discipline
- **Option/Maybe Types**: For nullable values

**When to Use Each:**

- Exceptions: Unexpected errors, exceptional conditions
- Result Types: Expected errors, validation failures
- Panics/Crashes: Unrecoverable errors, programming bugs

### 2. Error Categories

**Recoverable Errors:**

- Network timeouts
- Missing files
- Invalid user input
- API rate limits

**Unrecoverable Errors:**

- Out of memory
- Stack overflow
- Programming bugs (null pointer, etc.)

## Language-Specific Patterns

For detailed code examples in Python, TypeScript, Rust, and Go, see:
šŸ‘‰ **[examples/language-patterns.md](examples/language-patterns.md)**

## Universal Patterns

### Pattern 1: Circuit Breaker

Prevent cascading failures in distributed systems.

```python
from enum import Enum
from datetime import datetime, timedelta
from typing import Callable, TypeVar

T = TypeVar('T')

class CircuitState(Enum):
    CLOSED = "closed"       # Normal operation
    OPEN = "open"          # Failing, reject requests
    HALF_OPEN = "half_open"  # Testing if recovered

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        timeout: timedelta = timedelta(seconds=60),
        success_threshold: int = 2
    ):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.success_threshold = success_threshold
        self.failure_count = 0
        self.success_count = 0
        self.state = CircuitState.CLOSED
        self.last_failure_time = None

    def call(self, func: Callable[[], T]) -> T:
        if self.state == CircuitState.OPEN:
            if datetime.now() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
                self.success_count = 0
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func()
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise

    def on_success(self):
        self.failure_count = 0
        if self.state == CircuitState.HALF_OPEN:
            self.success_count += 1
            if self.success_count >= self.success_threshold:
                self.state = CircuitState.CLOSED
                self.success_count = 0

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = datetime.now()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
circuit_breaker = CircuitBreaker()

def fetch_data():
    return circuit_breaker.call(lambda: external_api.get_data())
```

### Pattern 2: Error Aggregation

Collect multiple errors instead of failing on first error.

```typescript
class ErrorCollector {
  private errors: Error[] = [];

  add(error: Error): void {
    this.errors.push(error);
  }

  hasErrors(): boolean {
    return this.errors.length > 0;
  }

  getErrors(): Error[] {
    return [...this.errors];
  }

  throw(): never {
    if (this.errors.length === 1) {
      throw this.errors[0];
    }
    throw new AggregateError(
      this.errors,
      `${this.errors.length} errors occurred`,
    );
  }
}

// Usage: Validate multiple fields
function validateUser(data: any): User {
  const errors = new ErrorCollector();

  if (!data.email) {
    errors.add(new ValidationError("Email is required"));
  } else if (!isValidEmail(data.email)) {
    errors.add(new ValidationError("Email is invalid"));
  }

  if (!data.name || data.name.length < 2) {
    errors.add(new ValidationError("Name must be at least 2 characters"));
  }

  if (!data.age || data.age < 18) {
    errors.add(new ValidationError("Age must be 18 or older"));
  }

  if (errors.hasErrors()) {
    errors.throw();
  }

  return data as User;
}
```

### Pattern 3: Graceful Degradation

Provide fallback functionality when errors occur.

```python
from typing import Optional, Callable, TypeVar

T = TypeVar('T')

def with_fallback(
    primary: Callable[[], T],
    fallback: Callable[[], T],
    log_error: bool = True
) -> T:
    """Try primary function, fall back to fallback on error."""
    try:
        return primary()
    except Exception as e:
        if log_error:
            logger.error(f"Primary function failed: {e}")
        return fallback()

# Usage
def get_user_profile(user_id: str) -> UserProfile:
    return with_fallback(
        primary=lambda: fetch_from_cache(user_id),
        fallback=lambda: fetch_from_database(user_id)
    )

# Multiple fallbacks
def get_exchange_rate(currency: str) -> float:
    return (
        try_function(lambda: api_provider_1.get_rate(currency))
        or try_function(lambda: api_provider_2.get_rate(currency))
        or try_function(lambda: cache.get_rate(currency))
        or DEFAULT_RATE
    )

def try_function(func: Callable[[], Optional[T]]) -> Optional[T]:
    try:
        return func()
    except Exception:
        return None
```

## Best Practices

1. **Fail Fast**: Validate input early, fail quickly
2. **Preserve Context**: Include stack traces, metadata, timestamps
3. **Meaningful Messages**: Explain what happened and how to fix it
4. **Log Appropriately**: Error = log, expected failure = don't spam logs
5. **Handle at Right Level**: Catch where you can meaningfully handle
6. **Clean Up Resources**: Use try-finally, context managers, defer
7. **Don't Swallow Errors**: Log or re-throw, don't silently ignore
8. **Type-Safe Errors**: Use typed errors when possible

```python
# Good error handling example
def process_order(order_id: str) -> Order:
    """Process order with comprehensive error handling."""
    try:
        # Validate input
        if not order_id:
            raise ValidationError("Order ID is required")

        # Fetch order
        order = db.get_order(order_id)
        if not order:
            raise NotFoundError("Order", order_id)

        # Process payment
        try:
            payment_result = payment_service.charge(order.total)
        except PaymentServiceError as e:
            # Log and wrap external service error
            logger.error(f"Payment failed for order {order_id}: {e}")
            raise ExternalServiceError(
                f"Payment processing failed",
                service="payment_service",
                details={"order_id": order_id, "amount": order.total}
            ) from e

        # Update order
        order.status = "completed"
        order.payment_id = payment_result.id
        db.save(order)

        return order

    except ApplicationError:
        # Re-raise known application errors
        raise
    except Exception as e:
        # Log unexpected errors
        logger.exception(f"Unexpected error processing order {order_id}")
        raise ApplicationError(
            "Order processing failed",
            code="INTERNAL_ERROR"
        ) from e
```

## Common Pitfalls

- **Catching Too Broadly**: `except Exception` hides bugs
- **Empty Catch Blocks**: Silently swallowing errors
- **Logging and Re-throwing**: Creates duplicate log entries
- **Not Cleaning Up**: Forgetting to close files, connections
- **Poor Error Messages**: "Error occurred" is not helpful
- **Returning Error Codes**: Use exceptions or Result types
- **Ignoring Async Errors**: Unhandled promise rejections

## Resources

- **references/exception-hierarchy-design.md**: Designing error class hierarchies
- **references/error-recovery-strategies.md**: Recovery patterns for different scenarios
- **references/async-error-handling.md**: Handling errors in concurrent code
- **assets/error-handling-checklist.md**: Review checklist for error handling
- **assets/error-message-guide.md**: Writing helpful error messages
- **scripts/error-analyzer.py**: Analyze error patterns in logs

Overview

This skill teaches production-ready error handling patterns across languages and architectures to build resilient applications. It covers exceptions vs Result types, recovery strategies, and pragmatic patterns like circuit breakers, error aggregation, and graceful degradation. Use it to design clearer APIs, reduce incidents, and improve observability and debuggability.

How this skill works

The skill inspects common failure modes and prescribes language-agnostic patterns plus concrete implementations (examples in Python, TypeScript, Rust, Go). It shows where to catch, wrap, rethrow, or surface errors, and provides reusable building blocks: circuit breaker for distributed systems, error collectors for validation, and fallback wrappers for graceful degradation. It also documents best practices, pitfalls, and checklists to enforce consistent handling and logging.

When to use it

  • Implementing error handling for new features or libraries
  • Designing APIs where errors must be typed and discoverable
  • Improving reliability for networked or distributed systems
  • Handling concurrent or asynchronous error flows
  • Creating clearer developer and user-facing error messages

Best practices

  • Fail fast: validate inputs early and return explicit errors for expected failures
  • Preserve context: attach stack traces, metadata, and causal chains when wrapping errors
  • Handle at the right level: catch only where you can remediate or add value
  • Log appropriately: avoid duplicate logs when rethrowing and separate expected vs unexpected
  • Use typed or Result-style errors for predictable flows and exceptions for truly exceptional cases
  • Ensure resource cleanup with finally/context managers/defer blocks

Example use cases

  • Wrap external API calls with a circuit breaker to prevent cascading failures
  • Validate user input with an ErrorCollector to return aggregated validation errors
  • Implement multi-tier fallbacks (cache → primary DB → secondary DB → default) for degraded availability
  • Design API error shapes and codes so clients can programmatically react (retry, abort, fallback)
  • Instrument services to attach error metadata for faster root cause analysis

FAQ

When should I prefer Result types over exceptions?

Prefer Result types for expected, recoverable failures (validation, user errors). Use exceptions for unexpected or programmer errors that should bubble up quickly.

How do I avoid noisy logs when rethrowing errors?

Log at the level that can add remediation context, then rethrow without logging again. Use structured logs and include an error ID so downstream handlers can reference the same event.