home / skills / codingheader / myskills / 0xdarkmatter-python-observability-patterns

This skill helps implement Python observability patterns with structured logging, request context, metrics, and tracing to improve monitoring and debugging.

This is most likely a fork of the python-observability-patterns skill from 0xdarkmatter
npx playbooks add skill codingheader/myskills --skill 0xdarkmatter-python-observability-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
5.0 KB
---
name: python-observability-patterns
description: "Observability patterns for Python applications. Triggers on: logging, metrics, tracing, opentelemetry, prometheus, observability, monitoring, structlog, correlation id."
compatibility: "Python 3.10+. Requires structlog, opentelemetry-api, prometheus-client."
allowed-tools: "Read Write"
depends-on: [python-async-patterns]
related-skills: [python-fastapi-patterns, python-cli-patterns]
---

# Python Observability Patterns

Logging, metrics, and tracing for production applications.

## Structured Logging with structlog

```python
import structlog

# Configure structlog
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()

# Usage
logger.info("user_created", user_id=123, email="[email protected]")
# Output: {"event": "user_created", "user_id": 123, "email": "[email protected]", "level": "info", "timestamp": "2024-01-15T10:00:00Z"}
```

## Request Context Propagation

```python
import structlog
from contextvars import ContextVar
from uuid import uuid4

request_id_var: ContextVar[str] = ContextVar("request_id", default="")

def bind_request_context(request_id: str | None = None):
    """Bind request ID to logging context."""
    rid = request_id or str(uuid4())
    request_id_var.set(rid)
    structlog.contextvars.bind_contextvars(request_id=rid)
    return rid

# FastAPI middleware
@app.middleware("http")
async def request_context_middleware(request, call_next):
    request_id = request.headers.get("X-Request-ID") or str(uuid4())
    bind_request_context(request_id)
    response = await call_next(request)
    response.headers["X-Request-ID"] = request_id
    structlog.contextvars.clear_contextvars()
    return response
```

## Prometheus Metrics

```python
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import FastAPI, Response

# Define metrics
REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "endpoint", "status"]
)

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    ["method", "endpoint"],
    buckets=[0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
)

ACTIVE_CONNECTIONS = Gauge(
    "active_connections",
    "Number of active connections"
)

# Middleware to record metrics
@app.middleware("http")
async def metrics_middleware(request, call_next):
    ACTIVE_CONNECTIONS.inc()
    start = time.perf_counter()

    response = await call_next(request)

    duration = time.perf_counter() - start
    REQUEST_COUNT.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()
    REQUEST_LATENCY.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)
    ACTIVE_CONNECTIONS.dec()

    return response

# Metrics endpoint
@app.get("/metrics")
async def metrics():
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )
```

## OpenTelemetry Tracing

```python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

# Manual instrumentation
async def process_order(order_id: int):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order_id", order_id)

        with tracer.start_as_current_span("validate_order"):
            await validate(order_id)

        with tracer.start_as_current_span("charge_payment"):
            await charge(order_id)
```

## Quick Reference

| Library | Purpose |
|---------|---------|
| structlog | Structured logging |
| prometheus-client | Metrics collection |
| opentelemetry | Distributed tracing |

| Metric Type | Use Case |
|-------------|----------|
| Counter | Total requests, errors |
| Histogram | Latencies, sizes |
| Gauge | Current connections, queue size |

## Additional Resources

- `./references/structured-logging.md` - structlog configuration, formatters
- `./references/metrics.md` - Prometheus patterns, custom metrics
- `./references/tracing.md` - OpenTelemetry, distributed tracing

## Assets

- `./assets/logging-config.py` - Production logging configuration

---

## See Also

**Prerequisites:**
- `python-async-patterns` - Async context propagation

**Related Skills:**
- `python-fastapi-patterns` - API middleware for metrics/tracing
- `python-cli-patterns` - CLI logging patterns

**Integration Skills:**
- `python-database-patterns` - Database query tracing

Overview

This skill provides proven observability patterns for Python applications covering structured logging, metrics, and distributed tracing. It includes examples and middleware-ready snippets for structlog, Prometheus metrics, and OpenTelemetry tracing to help instrument web services reliably. The patterns focus on request context propagation, low-overhead metrics, and clear trace spans for production use.

How this skill works

The skill demonstrates how to configure structlog for JSON structured logs and bind request IDs via contextvars so logs carry correlation identifiers. It shows Prometheus client patterns: counters, histograms, and gauges plus an HTTP /metrics endpoint and middleware to record request counts and latencies. It also outlines OpenTelemetry tracer setup with a BatchSpanProcessor and OTLP exporter, and examples of manual span creation to capture service workflows.

When to use it

  • You need consistent, structured logs with request correlation across async code paths.
  • You want lightweight, production-ready HTTP metrics (request counts, latencies, active connections).
  • You need end-to-end distributed traces across services using OpenTelemetry and OTLP exporters.
  • You are instrumenting FastAPI or other ASGI frameworks and need middleware examples for metrics and context propagation.

Best practices

  • Bind and clear request-scoped context (request_id) using contextvars to avoid leaking IDs across requests.
  • Emit JSON logs with explicit fields (event, level, timestamp, request_id) for reliable ingestion and querying.
  • Use counters for totals, histograms for latency distributions, and gauges for current state; choose histogram buckets to match expected latency ranges.
  • Record metrics in middleware to capture timing and status consistently, and avoid heavy work on the critical path (use batch exporters/async).
  • Create meaningful span names and attributes, and keep spans short-lived to reduce tracing overhead; export spans with a BatchSpanProcessor.

Example use cases

  • Add structlog JSON logging to a FastAPI app and bind X-Request-ID from incoming headers for log correlation.
  • Expose a /metrics endpoint and middleware that records request count, latency histogram, and active connections for Prometheus scraping.
  • Instrument key service operations with OpenTelemetry spans and export to an OTLP collector for centralized tracing and flamegraphs.
  • Combine logs and traces by including trace and span IDs in structured log fields for easier triage.
  • Monitor connection and queue sizes with gauges and alert on sudden changes or saturation.

FAQ

How do I ensure request_id is present in all logs?

Bind request_id at the start of request handling using contextvars and structlog.contextvars.bind_contextvars, then clear contextvars after the response is sent.

What metric types should I use for latency and error counts?

Use histograms for latency distributions (choose buckets that reflect expected latencies) and counters for total requests and errors; pair with labels for method, endpoint, and status.