home / skills / phrazzld / claude-config / observability

observability skill

This skill helps maintain complete observability by auditing, implementing, and verifying Sentry, PostHog, and health checks from the CLI.

npx playbooks add skill phrazzld/claude-config --skill observability

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

7.6 KB

---
name: observability
description: |
  Complete observability infrastructure. Error tracking, alerting, logging, health checks.
  Optimized for indie dev: minimal services, CLI-manageable, AI agent integration.
argument-hint: "[focus area, e.g. 'alerts' or 'health checks']"
effort: high
---

# /observability

Production observability with one service. Audit, fix, verify—every time.

## Philosophy

**Two services, not twenty.** Sentry handles errors. PostHog handles product analytics. That's it. Vercel captures stdout automatically (no setup needed).

**CLI-first.** Everything manageable from command line. No dashboard clicking.

**AI-agent ready.** Errors should trigger automated analysis and fixes.

## What This Does

Examines project observability, identifies gaps, implements fixes, and verifies alerting works. Every run does the full cycle.

## Branching

Assumes you start on `master`/`main`. Before making code changes:

```bash
git checkout -b infra/observability-$(date +%Y%m%d)
```

## Architecture

```
App → Sentry (errors, performance)
    → PostHog (product analytics, feature flags)
    → stdout (Vercel captures automatically)
    → /api/health (uptime monitoring)

AI Integration:
    Sentry MCP → Claude (query errors, analyze, fix)
    PostHog MCP → Claude (query funnels, cohorts, events)
    Sentry webhook → GitHub Action → agent (auto-triage)
    CLI scripts → manual triage and resolution
```

**Services:** 2 (Sentry + PostHog)
**Built-in free:** Vercel logs
**CLI-manageable:** 100% (both have MCP servers)

## Process

### 1. Audit

**Check what exists:**
```bash
# Sentry configured?
~/.claude/skills/sentry-observability/scripts/detect_sentry.sh

# Health endpoint?
[ -f "app/api/health/route.ts" ] || [ -f "src/app/api/health/route.ts" ] && echo "✓ Health endpoint" || echo "✗ Health endpoint"

# Structured logging?
grep -r "console.log\|console.error" --include="*.ts" --include="*.tsx" src/ app/ 2>/dev/null | head -5

# PostHog analytics?
grep -q "posthog" package.json && echo "✓ PostHog" || echo "✗ PostHog not installed (P1)"
```

**Spawn agent for deep review:**
Spawn `observability-advocate` agent to audit logging coverage, error handling, and silent failure risks.

### 2. Plan

Every project needs:

**Essential (every production app):**
- Sentry error tracking with source maps
- Health check endpoint (`/api/health`)
- Structured logging (JSON to stdout)
- At least one alert rule (new errors)

**Required (user-facing apps):**
- PostHog product analytics (funnels, cohorts, session replay)
- PostHog feature flags (replaces LaunchDarkly)

**Recommended:**
- Webhook for AI agent integration
- Triage scripts for CLI management

**Only if needed:**
- Custom uptime monitoring

### 3. Execute

**Install Sentry:**
```bash
pnpm add @sentry/nextjs
npx @sentry/wizard@latest -i nextjs
```

Or use init script:
```bash
~/.claude/skills/sentry-observability/scripts/init_sentry.sh
```

**Configure PII redaction:**
```typescript
// sentry.client.config.ts
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  beforeSend(event) {
    // Scrub PII
    if (event.extra) delete event.extra.password;
    if (event.user) delete event.user.email;
    return event;
  },
});
```

**Create health endpoint:**
```typescript
// app/api/health/route.ts
export async function GET() {
  const checks = {
    app: 'ok',
    timestamp: new Date().toISOString(),
  };

  // Add service checks as needed
  // checks.database = await checkDb();
  // checks.stripe = await checkStripe();

  return Response.json(checks);
}
```

**Set up structured logging:**
Use JSON logs that Vercel can parse:
```typescript
// lib/logger.ts
export function log(level: 'info' | 'warn' | 'error', message: string, data?: Record<string, unknown>) {
  const entry = {
    level,
    message,
    timestamp: new Date().toISOString(),
    ...data,
  };
  console[level === 'error' ? 'error' : 'log'](JSON.stringify(entry));
}
```

**Create alert rule:**
```bash
~/.claude/skills/sentry-observability/scripts/create_alert.sh --name "New Errors" --type issue
```

**Set up webhook for AI integration (optional):**
In Sentry Dashboard → Settings → Integrations → Internal Integrations:
1. Create integration with webhook URL
2. Subscribe to issue events
3. Point to GitHub Action or custom endpoint

### 4. Verify

**Verify Sentry setup:**
```bash
~/.claude/skills/sentry-observability/scripts/verify_setup.sh
```

**Test error tracking:**
```typescript
// Trigger test error
throw new Error('Test error for Sentry verification');
```

Then check Sentry dashboard or:
```bash
~/.claude/skills/sentry-observability/scripts/list_issues.sh --limit 1
```

**Test health endpoint:**
```bash
curl -s http://localhost:3000/api/health | jq
```

**Test alerting:**
- Trigger an error
- Verify alert fires (check email/Slack/webhook)

If any verification fails, go back and fix it.

## AI Agent Integration

### Option A: Sentry MCP Server

For direct Claude integration, use the Sentry MCP server:
```json
// claude_desktop_config.json
{
  "mcpServers": {
    "sentry": {
      "command": "npx",
      "args": ["-y", "@anthropic/sentry-mcp"],
      "env": {
        "SENTRY_AUTH_TOKEN": "your-token",
        "SENTRY_ORG": "your-org"
      }
    }
  }
}
```

Claude can then:
- Query recent errors
- Get full error context
- Analyze root causes
- Propose fixes

### Option B: Webhook → GitHub Action → Agent

For automated triage:
```yaml
# .github/workflows/sentry-triage.yml
name: Sentry Auto-Triage

on:
  repository_dispatch:
    types: [sentry-issue]

jobs:
  triage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Analyze with Claude
        run: |
          # Query issue details and spawn analysis agent
          claude --print "Analyze Sentry issue ${{ github.event.client_payload.issue_id }}"
```

### Option C: CLI Scripts (Already Exist)

```bash
# List and triage issues
~/.claude/skills/sentry-observability/scripts/list_issues.sh --env production
~/.claude/skills/sentry-observability/scripts/triage_score.sh --json

# Get issue details for analysis
~/.claude/skills/sentry-observability/scripts/issue_detail.sh PROJ-123

# Resolve after fixing
~/.claude/skills/sentry-observability/scripts/resolve_issue.sh PROJ-123
```

## Tool Choices

**Sentry over alternatives.** Best error tracking, mature CLI, AI-first roadmap (Seer webhooks, auto-fix features), excellent free tier.

**Vercel logs over log services.** stdout is captured automatically. No additional service needed. Query with `vercel logs`.

**PostHog for ALL analytics.** Official MCP server, Terraform provider, all-in-one platform. 1M events/month free.

**NOT Vercel Analytics.** It has no API, no CLI, no MCP server. Completely unusable for AI-assisted workflows. Do not install it.

## Environment Variables

```bash
# .env.example

# Sentry (required for error tracking)
NEXT_PUBLIC_SENTRY_DSN=
SENTRY_AUTH_TOKEN=
SENTRY_ORG=
SENTRY_PROJECT=
```

## What You Get

When complete:
- Sentry capturing all errors with source maps
- Health check endpoint at `/api/health`
- Structured JSON logging (captured by Vercel)
- At least one alert rule configured
- PostHog for product analytics (user-facing apps)
- AI agent integration ready (MCP or webhooks)

User can:
- See errors in Sentry immediately when they occur
- Get alerted on new/critical errors
- Query errors via CLI (`list_issues.sh`, `triage_score.sh`)
- Trigger AI analysis of errors
- Monitor app health via `/api/health`
- View logs via `vercel logs`

## Related Skills

- `sentry-observability` — Detailed Sentry setup and scripts
- `observability-stack` — PostHog/analytics integration patterns
- `observability-advocate` — Agent for auditing observability coverage

Overview

This skill provides a complete, CLI-first observability infrastructure optimized for indie developers. It installs and configures Sentry for errors and performance, PostHog for product analytics and feature flags, structured JSON logging to stdout, and a health endpoint for uptime checks. Everything is scriptable and AI-agent ready for automated triage and fixes.

How this skill works

The skill audits your project for missing observability pieces, applies fixes (Sentry init, health route, structured logging, PostHog integration), and verifies alerting and error capture. It exposes CLI scripts to list, triage, and resolve Sentry issues, plus optional MCP/webhook integrations so an AI agent can analyze errors and propose or apply fixes. Each run performs audit → plan → execute → verify.

When to use it

You need reliable error tracking with minimal services and cost.
You want observability fully manageable from the CLI without dashboard workflows.
You plan to integrate automated AI triage or remediation for errors.
You want structured JSON logs captured automatically (e.g., Vercel stdout).
You need a simple health endpoint for uptime checks and monitoring.

Best practices

Keep services minimal: Sentry for errors and PostHog for analytics; rely on stdout for logs where possible.
Use structured JSON logs so platform logging can parse events reliably.
Protect PII by scrubbing sensitive fields in Sentry beforeSend handlers.
Manage changes in short infra branches (e.g., infra/observability-YYYYMMDD) and test locally before merging.
Verify each change: trigger test errors, curl /api/health, and confirm alerts/firewalls are working.

Example use cases

Add Sentry to a Next.js app with source maps and one alert rule for new errors.
Create a lightweight /api/health endpoint and hook it to an uptime monitor.
Replace ad-hoc console logs with a structured JSON logger that Vercel captures.
Enable PostHog for product funnels and roll out feature flags via its provider instead of a paid flag service.
Wire a Sentry webhook to a GitHub Action that triggers an AI analysis agent for automatic triage.

FAQ

Will this add many external services?

No. The architecture intentionally uses two primary services: Sentry and PostHog. Vercel stdout is used for logs, keeping service count low.

Can I run everything from the CLI?

Yes. Core setup, detection, verification, and triage are exposed as CLI scripts so you can avoid dashboard workflows.

How does AI integration work?

You can connect an MCP server for Sentry or route Sentry webhooks through a GitHub Action to an AI agent. Agents can query issues, analyze root causes, and suggest or apply fixes.