home / skills / 0xdarkmatter / claude-mods / python-observability-patterns

python-observability-patterns skill

/skills/python-observability-patterns

This skill helps you implement Python observability patterns for structured logging, metrics, and tracing to improve monitoring and diagnostics.

npx playbooks add skill 0xdarkmatter/claude-mods --skill python-observability-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
5.0 KB
---
name: python-observability-patterns
description: "Observability patterns for Python applications. Triggers on: logging, metrics, tracing, opentelemetry, prometheus, observability, monitoring, structlog, correlation id."
compatibility: "Python 3.10+. Requires structlog, opentelemetry-api, prometheus-client."
allowed-tools: "Read Write"
depends-on: [python-async-patterns]
related-skills: [python-fastapi-patterns, python-cli-patterns]
---

# Python Observability Patterns

Logging, metrics, and tracing for production applications.

## Structured Logging with structlog

```python
import structlog

# Configure structlog
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()

# Usage
logger.info("user_created", user_id=123, email="[email protected]")
# Output: {"event": "user_created", "user_id": 123, "email": "[email protected]", "level": "info", "timestamp": "2024-01-15T10:00:00Z"}
```

## Request Context Propagation

```python
import structlog
from contextvars import ContextVar
from uuid import uuid4

request_id_var: ContextVar[str] = ContextVar("request_id", default="")

def bind_request_context(request_id: str | None = None):
    """Bind request ID to logging context."""
    rid = request_id or str(uuid4())
    request_id_var.set(rid)
    structlog.contextvars.bind_contextvars(request_id=rid)
    return rid

# FastAPI middleware
@app.middleware("http")
async def request_context_middleware(request, call_next):
    request_id = request.headers.get("X-Request-ID") or str(uuid4())
    bind_request_context(request_id)
    response = await call_next(request)
    response.headers["X-Request-ID"] = request_id
    structlog.contextvars.clear_contextvars()
    return response
```

## Prometheus Metrics

```python
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import FastAPI, Response

# Define metrics
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
)

ACTIVE_CONNECTIONS = Gauge(
    "active_connections",
    "Number of active connections"
)

# Middleware to record metrics
@app.middleware("http")
async def metrics_middleware(request, call_next):
    ACTIVE_CONNECTIONS.inc()
    start = time.perf_counter()

    response = await call_next(request)

    duration = time.perf_counter() - start
    REQUEST_COUNT.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()
    REQUEST_LATENCY.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)
    ACTIVE_CONNECTIONS.dec()

    return response

# Metrics endpoint
@app.get("/metrics")
async def metrics():
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )
```

## OpenTelemetry Tracing

```python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

# Manual instrumentation
async def process_order(order_id: int):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order_id", order_id)

        with tracer.start_as_current_span("validate_order"):
            await validate(order_id)

        with tracer.start_as_current_span("charge_payment"):
            await charge(order_id)
```

## Quick Reference

| Library | Purpose |
|---------|---------|
| structlog | Structured logging |
| prometheus-client | Metrics collection |
| opentelemetry | Distributed tracing |

| Metric Type | Use Case |
|-------------|----------|
| Counter | Total requests, errors |
| Histogram | Latencies, sizes |
| Gauge | Current connections, queue size |

## Additional Resources

- `./references/structured-logging.md` - structlog configuration, formatters
- `./references/metrics.md` - Prometheus patterns, custom metrics
- `./references/tracing.md` - OpenTelemetry, distributed tracing

## Assets

- `./assets/logging-config.py` - Production logging configuration

---

## See Also

**Prerequisites:**
- `python-async-patterns` - Async context propagation

**Related Skills:**
- `python-fastapi-patterns` - API middleware for metrics/tracing
- `python-cli-patterns` - CLI logging patterns

**Integration Skills:**
- `python-database-patterns` - Database query tracing

Overview

This skill provides pragmatic observability patterns for Python applications, covering structured logging, metrics, and distributed tracing. It supplies reusable middleware and configuration snippets for structlog, Prometheus client metrics, and OpenTelemetry tracing. The goal is reliable context propagation, actionable telemetry, and easy integration with common frameworks like FastAPI.

How this skill works

The skill inspects common observability touchpoints and offers code patterns to bind request context (request IDs) to logs, record Prometheus metrics in middleware, and emit OpenTelemetry spans for distributed traces. It includes examples for structlog configuration, request-context propagation with contextvars, Prometheus counters/histograms/gauges, and OTLP span exporting. Use the provided middleware and instrumentation snippets to ensure consistent metadata flows across logs, metrics, and traces.

When to use it

  • You need structured, JSON logs with request correlation across services.
  • You want to expose Prometheus-compatible metrics for request counts, latency, and active connections.
  • You need distributed tracing with OpenTelemetry and OTLP export to a collector.
  • You want to add lightweight middleware to FastAPI to capture telemetry automatically.
  • You need reproducible patterns for production observability and debugging.

Best practices

  • Always bind a request ID (X-Request-ID) early and clear contextvars after the request completes.
  • Emit structured JSON logs with consistent fields (event, level, timestamp, request_id, user_id).
  • Record labels/tags sparingly to avoid metric cardinality explosions (use high-level endpoint names, not full URLs).
  • Use histograms for latency distributions and suitable buckets for expected latencies.
  • Export traces via an OTLP collector and set sampling appropriate to production traffic.

Example use cases

  • Add structlog to an existing FastAPI app to produce correlated JSON logs for SRE and debugging.
  • Instrument HTTP middleware to increment Prometheus counters and observe request latency histograms.
  • Create a request-context middleware that injects X-Request-ID into logs and response headers for end-to-end correlation.
  • Wrap critical operations with OpenTelemetry spans (e.g., process_order → validate_order → charge_payment) to visualize distributed flows.
  • Expose a /metrics endpoint that Prometheus scrapes for service-level dashboards and alerts.

FAQ

How do I avoid high cardinality in Prometheus labels?

Use coarse-grained labels (method, endpoint group) and avoid user-specific or high-cardinality IDs; prefer aggregations or histograms for variability.

Where should I bind and clear request context?

Bind the request ID at the entry middleware and clear contextvars right after the response is returned to avoid leaking context between requests.

How do I export traces to a backend?

Configure an OTLP exporter with an appropriate endpoint (collector), add a BatchSpanProcessor, and tune sampling and batching for production throughput.