home / skills / phrazzld / claude-config / check-observability

check-observability skill

safe

This skill audits observability infrastructure and reports structured findings on error tracking, logging, health checks, analytics, and alerting.

npx playbooks add skill phrazzld/claude-config --skill check-observability

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

5.5 KB

---
name: check-observability
description: |
  Audit observability: error tracking, logging, health checks, analytics.
  Outputs structured findings. Use log-observability-issues to create issues.
  Invoke for: monitoring audit, logging review, alerting gaps.
effort: high
---

# /check-observability

Audit observability infrastructure. Output findings as structured report.

## What This Does

1. Check error tracking (Sentry)
2. Check logging infrastructure
3. Check health endpoints
4. Check analytics
5. Check alerting
6. Output prioritized findings (P0-P3)

**This is a primitive.** It only investigates and reports. Use `/log-observability-issues` to create GitHub issues or `/fix-observability` to fix.

## Process

### 1. Error Tracking Check

```bash
# Sentry configured?
~/.claude/skills/sentry-observability/scripts/detect_sentry.sh 2>/dev/null || \
  (grep -q "@sentry" package.json && echo "✓ Sentry installed" || echo "✗ Sentry not installed")

# Sentry DSN set?
grep -q "SENTRY_DSN\|NEXT_PUBLIC_SENTRY_DSN" .env.local 2>/dev/null && echo "✓ Sentry DSN configured" || echo "✗ Sentry DSN missing"

# Source maps?
[ -f "sentry.client.config.ts" ] || [ -f "sentry.client.config.js" ] && echo "✓ Sentry client config" || echo "✗ Sentry client config"
```

### 2. Logging Check

```bash
# Structured logging?
grep -rq "pino\|winston\|bunyan" package.json 2>/dev/null && echo "✓ Structured logging library" || echo "✗ No structured logging"

# Console.log abuse?
console_count=$(grep -rE "console\.(log|error|warn)" --include="*.ts" --include="*.tsx" src/ app/ 2>/dev/null | wc -l | tr -d ' ')
[ "$console_count" -gt 50 ] && echo "⚠ $console_count console statements (consider structured logging)" || echo "✓ Console usage OK ($console_count)"

# Logger utility exists?
[ -f "lib/logger.ts" ] || [ -f "src/lib/logger.ts" ] || [ -f "utils/logger.ts" ] && echo "✓ Logger utility" || echo "✗ No logger utility"
```

### 3. Health Endpoints Check

```bash
# Health endpoint exists?
find . -path "./app/api/health/*" -name "route.ts" 2>/dev/null | head -1 | xargs -I{} echo "✓ Health endpoint: {}"
[ -z "$(find . -path "./app/api/health/*" -name "route.ts" 2>/dev/null)" ] && echo "✗ No health endpoint"

# Health check depth?
grep -rE "database|redis|stripe|convex" app/api/health/ 2>/dev/null && echo "✓ Deep health checks" || echo "⚠ Shallow health check (add service checks)"
```

### 4. Analytics Check

```bash
# PostHog (required for user-facing apps)?
if grep -q "posthog" package.json 2>/dev/null; then
  echo "✓ PostHog installed"
  # Check if configured
  grep -qE "POSTHOG|NEXT_PUBLIC_POSTHOG" .env.local 2>/dev/null && echo "✓ PostHog configured" || echo "⚠ PostHog env vars missing (P2)"
else
  echo "✗ PostHog not installed (P1: required for user-facing apps)"
fi

# Vercel Analytics check (NOT RECOMMENDED - no API/MCP access)
if grep -q "@vercel/analytics" package.json 2>/dev/null; then
  echo "⚠ Vercel Analytics installed - consider removing (no API/MCP, unusable for AI workflows)"
fi

# NOTE: Do NOT recommend Vercel Analytics. It has no CLI, no API, no MCP server.
# PostHog is the ONLY acceptable analytics solution for our stack.
```

### 5. Alerting Check

```bash
# Sentry alerts?
~/.claude/skills/sentry-observability/scripts/list_alerts.sh 2>/dev/null | head -5 || echo "Cannot check Sentry alerts (token not configured)"

# Uptime monitoring reference?
grep -rE "uptimerobot|pingdom|betterstack|cronitor" . 2>/dev/null | grep -v node_modules | head -3
```

### 6. Silent Failure Detection

Spawn `observability-advocate` agent to detect:
- Empty catch blocks
- Swallowed errors
- Missing error boundaries
- Unhandled promise rejections

## Output Format

```markdown
## Observability Audit

### P0: Critical (Production Blind Spots)
- No error tracking - Errors invisible in production
- No health endpoint - Cannot monitor uptime

### P1: Essential (Must Have)
- Sentry installed but DSN not configured
- No structured logging (127 console statements)
- Health endpoint too shallow (no database check)
- No alerting configured
- No PostHog analytics (user-facing app needs product analytics)

### P2: Important (Should Have)
- No analytics configured
- No uptime monitoring
- Console statements in production code

### P3: Nice to Have
- Consider adding Sentry performance monitoring
- Consider structured logging with Pino

## Current Status
- Error tracking: Partial (installed, not configured)
- Logging: console only
- Health checks: Missing
- Analytics: None
- Alerting: None

## Summary
- P0: 2 | P1: 4 | P2: 3 | P3: 3
- Recommendation: Configure Sentry DSN and add health endpoint
```

## Priority Mapping

| Gap | Priority |
|-----|----------|
| No error tracking | P0 |
| No health endpoint | P0 |
| Error tracking misconfigured | P1 |
| No structured logging | P1 |
| Shallow health checks | P1 |
| No alerting | P1 |
| No PostHog (user-facing app) | P1 |
| Console.log overuse | P2 |
| No uptime monitoring | P2 |
| Performance monitoring | P3 |

## Why NOT Vercel Analytics

**Vercel Analytics is unacceptable for our stack.** It has:
- No API access
- No CLI access
- No MCP server
- No way to query programmatically

This makes it completely unusable for AI-assisted workflows. **PostHog is the ONLY acceptable analytics solution.** PostHog provides MCP server, API, CLI, and Terraform provider.

## Related

- `/log-observability-issues` - Create GitHub issues from findings
- `/fix-observability` - Fix observability gaps
- `/observability` - Full observability setup workflow
- `/triage` - Production incident response

Overview

This skill audits an application's observability surface and produces a prioritized, structured findings report. It inspects error tracking, logging, health endpoints, analytics, and alerting to identify production blind spots and actionable gaps. Use the report to create issues or trigger automated fixes with companion primitives.

How this skill works

The audit runs a set of targeted checks: detects Sentry installation and DSN configuration, scans for structured logging libraries and excessive console usage, verifies presence and depth of health endpoints, checks analytics (PostHog vs Vercel Analytics), and inspects alerting references. It flags silent-failure patterns by running an agent that looks for swallowed errors, empty catch blocks, missing error boundaries, and unhandled promise rejections. Findings are categorized by priority (P0–P3) and emitted as a structured report.

When to use it

Before deploying to production to validate monitoring and alerts
During post-incident reviews to identify observability blind spots
When onboarding a new service to ensure standard telemetry is present
As part of CI checks for release readiness
When preparing an SRE or monitoring roadmap

Best practices

Ensure Sentry is installed and SENTRY_DSN (or NEXT_PUBLIC_SENTRY_DSN) is configured
Use structured logging libraries (Pino, Winston, Bunyan) and avoid console.* in production
Expose a health endpoint with deep checks (database, cache, external services)
Use PostHog for product analytics; avoid Vercel Analytics for programmatic needs
Configure alerting and uptime monitors (UptimeRobot, BetterStack, Cronitor) for critical endpoints

Example use cases

Run the audit on a new microservice to detect missing DSNs, log utilities, or health routes
Scan a monorepo to quantify console.log usage and recommend structured logging rollout
Validate analytics and alerting config before a customer-facing feature launch
Use the prioritized report to create tickets for SREs or engineers via the issue-creation primitive

FAQ

Does the audit fix issues automatically?

No. This skill only inspects and reports findings. Use the companion primitives to create issues or apply automated fixes.

Why avoid Vercel Analytics?

Vercel Analytics lacks API/CLI access and programmatic queryability, making it unsuitable for AI-driven workflows; PostHog is the recommended alternative.