home / skills / levnikolaevich / claude-code-skills / ln-627-observability-auditor

ln-627-observability-auditor skill

safe

This skill audits observability in codebases, identifying logging, health checks, metrics, and tracing gaps to improve reliability with actionable findings.

npx playbooks add skill levnikolaevich/claude-code-skills --skill ln-627-observability-auditor

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.4 KB

---
name: ln-627-observability-auditor
description: Observability audit worker (L3). Checks structured logging, health check endpoints, metrics collection, request tracing, log levels. Returns findings with severity, location, effort, recommendations.
allowed-tools: Read, Grep, Glob, Bash
---

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

# Observability Auditor (L3 Worker)

Specialized worker auditing logging, monitoring, and observability.

## Purpose & Scope

- **Worker in ln-620 coordinator pipeline**
- Audit **observability** (Category 10: Medium Priority)
- Check logging, health checks, metrics, tracing
- Calculate compliance score (X/10)

## Inputs (from Coordinator)

Receives `contextStore` with tech stack, framework, codebase root, output_dir.

## Workflow

1) Parse context + output_dir
2) Check observability patterns
3) Collect findings
4) Calculate score
5) **Write Report:** Build full markdown report in memory per `shared/templates/audit_worker_report_template.md`, write to `{output_dir}/627-observability.md` in single Write call
6) **Return Summary:** Return minimal summary to coordinator

## Audit Rules

### 1. Structured Logging
**Detection:**
- Grep for `console.log` (unstructured)
- Check for proper logger: winston, pino, logrus, zap

**Severity:**
- **MEDIUM:** Production code using console.log
- **LOW:** Dev code using console.log

**Recommendation:** Use structured logger (winston, pino)

**Effort:** M (add logger, replace calls)

### 2. Health Check Endpoints
**Detection:**
- Grep for `/health`, `/ready`, `/live` routes
- Check API route definitions

**Severity:**
- **HIGH:** No health check endpoint (monitoring blind spot)

**Recommendation:** Add `/health` endpoint

**Effort:** S (add simple route)

### 3. Metrics Collection
**Detection:**
- Check for Prometheus client, StatsD, CloudWatch
- Grep for metric recording: `histogram`, `counter`

**Severity:**
- **MEDIUM:** No metrics instrumentation

**Recommendation:** Add Prometheus metrics

**Effort:** M (instrument code)

### 4. Request Tracing
**Detection:**
- Check for correlation IDs in logs
- Verify trace propagation (OpenTelemetry, Zipkin)

**Severity:**
- **MEDIUM:** No correlation IDs (hard to debug distributed systems)

**Recommendation:** Add request ID middleware

**Effort:** M (add middleware, propagate IDs)

### 5. Log Levels
**Detection:**
- Check if logger supports levels (info, warn, error, debug)
- Verify proper level usage

**Severity:**
- **LOW:** Only error logging (insufficient visibility)

**Recommendation:** Add info/debug logs

**Effort:** S (add log statements)

## Scoring Algorithm

**MANDATORY READ:** Load `shared/references/audit_scoring.md` for unified scoring formula.

## Output Format

**MANDATORY READ:** Load `shared/templates/audit_worker_report_template.md` for file format.

Write report to `{output_dir}/627-observability.md` with `category: "Observability"` and checks: structured_logging, health_endpoints, metrics_collection, request_tracing, log_levels.

Return summary to coordinator:
```
Report written: docs/project/.audit/627-observability.md
Score: X.X/10 | Issues: N (C:N H:N M:N L:N)
```

## Reference Files

- **Worker report template:** `shared/templates/audit_worker_report_template.md`
- **Audit scoring formula:** `shared/references/audit_scoring.md`
- **Audit output schema:** `shared/references/audit_output_schema.md`

## Critical Rules

- **Do not auto-fix:** Report only, never inject logging or endpoints
- **Framework-aware detection:** Adapt patterns to project's tech stack (winston/pino for Node, logrus/zap for Go, etc.)
- **Effort realism:** S = <1h, M = 1-4h, L = >4h
- **Exclusions:** Skip test files for console.log detection, skip dev-only scripts
- **Context-sensitive severity:** console.log in production code = MEDIUM, in dev utilities = LOW

## Definition of Done

- contextStore parsed (tech stack, framework, output_dir)
- All 5 checks completed (structured logging, health endpoints, metrics, request tracing, log levels)
- Findings collected with severity, location, effort, recommendation
- Score calculated per `shared/references/audit_scoring.md`
- Report written to `{output_dir}/627-observability.md` (atomic single Write call)
- Summary returned to coordinator

---
**Version:** 3.0.0
**Last Updated:** 2025-12-23

Overview

This skill audits observability for a codebase, covering structured logging, health check endpoints, metrics collection, request tracing, and log levels. It produces a findings report with severity, location, estimated effort, and concrete recommendations, plus a numeric compliance score. The worker is framework-aware and returns a concise summary for coordinator pipelines.

How this skill works

The auditor parses the provided context (tech stack, framework, code root, and output directory) and scans source files for observability patterns. It applies rule sets to detect unstructured logging, missing health endpoints, absent metrics, lack of request tracing, and improper log level usage. Findings are collected, scored per the shared scoring formula, written as a single report file to the output directory, and a minimal summary is returned to the coordinator.

When to use it

As part of a delivery pipeline to validate runtime visibility before release
During architecture or QA gates to identify observability blind spots
When onboarding a new service or refactoring to ensure instrumentation standards
Before deploying to production or scaling services in distributed environments
As a recurring audit to track observability drift over time

Best practices

Run the audit with accurate context (framework and runtime) to enable framework-aware detections
Exclude test and dev-only scripts from console.log checks to avoid false positives
Prioritize HIGH findings (e.g., missing health endpoints) for immediate remediation
Adopt structured loggers and consistent log levels across services
Instrument minimal Prometheus metrics and request ID propagation early to enable distributed debugging

Example use cases

CI pipeline audit step that fails the release on LOW compliance score
Pre-release checklist item to ensure /health and readiness endpoints exist
Refactor validation: confirm console.log calls were replaced with structured logger usage
Observability maturity check during architecture reviews to estimate remediation effort
Post-incident audit to identify missing tracing or metrics that hindered root cause analysis

FAQ

What output does the auditor produce?

It writes a single markdown report to the specified output directory with category "Observability", detailed findings for each check, a compliance score, and returns a one-line summary to the coordinator.

Does the auditor modify code to fix issues?

No. The auditor only reports findings and recommendations; it never injects or edits code.

How are severities and effort estimated?

Severity reflects impact (HIGH, MEDIUM, LOW) and context (production vs dev). Effort uses S (<1h), M (1–4h), L (>4h) as realistic remediation estimates.