home / skills / athola / claude-night-market / error-patterns

This skill standardizes error handling patterns, classifies failures, enables recovery strategies, and logs issues to improve resilience across plugins.

npx playbooks add skill athola/claude-night-market --skill error-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
4.6 KB
---
name: error-patterns
description: 'Standardized error handling patterns with classification, recovery,
  and logging strategies.


  error handling, error recovery, graceful degradation, resilience.'
category: infrastructure
tags:
- errors
- error-handling
- recovery
- resilience
- debugging
dependencies:
- usage-logging
provides:
  infrastructure:
  - error-handling
  - error-classification
  - recovery
  patterns:
  - graceful-degradation
  - error-logging
  - debugging
usage_patterns:
- error-handling
- resilience-patterns
- debugging-workflows
complexity: beginner
estimated_tokens: 450
progressive_loading: true
modules:
- modules/classification.md
- modules/recovery-strategies.md
---
## Table of Contents

- [Overview](#overview)
- [When to Use](#when-to-use)
- [Error Classification](#error-classification)
- [By Severity](#by-severity)
- [By Recoverability](#by-recoverability)
- [Quick Start](#quick-start)
- [Standard Error Handler](#standard-error-handler)
- [Error Result](#error-result)
- [Common Patterns](#common-patterns)
- [Authentication Errors (401/403)](#authentication-errors-(401-403))
- [Rate Limit Errors (429)](#rate-limit-errors-(429))
- [Timeout Errors](#timeout-errors)
- [Context Too Large (400)](#context-too-large-(400))
- [Integration Pattern](#integration-pattern)
- [Detailed Resources](#detailed-resources)
- [Exit Criteria](#exit-criteria)


# Error Patterns

## Overview

Standardized error handling patterns for consistent, production-grade behavior across plugins. Provides error classification, recovery strategies, and debugging workflows.

## When To Use

- Building resilient integrations
- Need consistent error handling
- Want graceful degradation
- Debugging production issues

## When NOT To Use

- Project doesn't use the leyline infrastructure patterns
- Simple scripts without service architecture needs

## Error Classification

### By Severity

| Level | Action | Example |
|-------|--------|---------|
| **Critical** | Halt, alert | Auth failure, service down |
| **Error** | Retry or secondary strategy | Rate limit, timeout |
| **Warning** | Log, continue | Partial results, deprecation |
| **Info** | Log only | Non-blocking issues |

### By Recoverability

```python
class ErrorCategory(Enum):
    TRANSIENT = "transient"      # Retry likely to succeed
    PERMANENT = "permanent"       # Retry won't help
    CONFIGURATION = "config"      # User action needed
    RESOURCE = "resource"         # Quota/limit issue
```
**Verification:** Run the command with `--help` flag to verify availability.

## Quick Start

### Standard Error Handler
```python
from leyline.error_patterns import handle_error, ErrorCategory

try:
    result = service.execute(prompt)
except RateLimitError as e:
    return handle_error(e, ErrorCategory.RESOURCE, {
        "retry_after": e.retry_after,
        "service": "gemini"
    })
except AuthError as e:
    return handle_error(e, ErrorCategory.CONFIGURATION, {
        "action": "Run 'gemini auth login'"
    })
```
**Verification:** Run the command with `--help` flag to verify availability.

### Error Result
```python
@dataclass
class ErrorResult:
    category: ErrorCategory
    message: str
    recoverable: bool
    suggested_action: str
    metadata: dict
```
**Verification:** Run the command with `--help` flag to verify availability.

## Common Patterns

### Authentication Errors (401/403)
- Verify credentials exist
- Check token expiration
- Validate permissions/scopes
- Suggest re-authentication

### Rate Limit Errors (429)
- Extract retry-after header
- Log for quota tracking
- Implement backoff
- Consider alternative service

### Timeout Errors
- Increase timeout for retries
- Break into smaller requests
- Use async patterns
- Consider different model

### Context Too Large (400)
- Estimate tokens before request
- Split into multiple requests
- Reduce input content
- Use larger context model

## Integration Pattern

```yaml
# In your skill's frontmatter
dependencies: [leyline:error-patterns]
```
**Verification:** Run the command with `--help` flag to verify availability.

## Detailed Resources

- **Classification**: See `modules/classification.md` for error taxonomy
- **Recovery**: See `modules/recovery-strategies.md` for handling patterns

## Exit Criteria

- Error classified correctly
- Appropriate recovery attempted
- User-actionable message provided
- Error logged for debugging
## Troubleshooting

### Common Issues

**Command not found**
Ensure all dependencies are installed and in PATH

**Permission errors**
Check file permissions and run with appropriate privileges

**Unexpected behavior**
Enable verbose logging with `--verbose` flag

Overview

This skill standardizes error handling across integrations by providing classification, recovery strategies, and logging conventions. It delivers a compact ErrorResult model and helper handlers to produce consistent, actionable responses for errors. The goal is predictable behavior, graceful degradation, and easier debugging in production systems.

How this skill works

The skill inspects exceptions and maps them to an ErrorCategory (e.g., TRANSIENT, PERMANENT, CONFIGURATION, RESOURCE). Handlers return an ErrorResult with a category, recoverability flag, suggested action, and metadata. Built-in patterns cover auth, rate limits, timeouts, and context-size issues, and the library encourages structured logging and retry/backoff strategies.

When to use it

  • Building resilient integrations that must degrade gracefully
  • Centralizing error classification and recovery across services
  • Implementing consistent user-facing error messages and logging
  • Handling rate limits, auth failures, timeouts, or oversized requests
  • Automating retry and backoff logic for transient failures

Best practices

  • Classify errors by severity and recoverability before acting
  • Return an ErrorResult with suggested_action and metadata for operators
  • Prefer exponential backoff for TRANSIENT errors and limit retries
  • Log structured context for every handled error to aid debugging
  • Provide clear user actions for CONFIGURATION errors (e.g., re-auth)
  • Estimate token/context size and split requests to avoid 400 errors

Example use cases

  • Wrap API calls to detect 401/403 and prompt re-authentication with a specific command
  • Handle 429 responses by extracting retry-after, scheduling a backoff, and switching to a secondary service
  • Catch timeouts, retry with longer timeouts or smaller payloads, and fall back when needed
  • Preflight large inputs to split requests when encountering context-too-large errors
  • Normalize error responses from multiple downstream services into a single ErrorResult format for monitoring

FAQ

How do I decide between retrying and failing fast?

Classify the error: TRANSIENT errors should be retried with backoff; PERMANENT and CONFIGURATION errors should fail fast and surface a user action.

What metadata should I include in ErrorResult?

Include service name, error code, retry_after if present, request id, and any parameter values that help reproduce the issue.

How should rate limits be handled across multiple services?

Extract retry-after headers, log quota usage, apply centralized backoff policy, and consider routing requests to an alternative service if available.