home / skills / phrazzld / claude-config / verify-fix
This skill verifies production fix effectiveness by validating logs, metrics, and database state before declaring incidents resolved.
npx playbooks add skill phrazzld/claude-config --skill verify-fixReview the files below or copy the command above to add this skill to your agents.
---
name: verify-fix
description: "Mandatory incident fix verification with observables. Invoke after: applying production fixes, before declaring incidents resolved, when someone says 'I think that fixed it'. Requires log entries, metric changes, and database state confirmation."
effort: max
---
# /verify-fix - Verify Incident Fix with Observables
**MANDATORY** verification step after any production incident fix.
## Philosophy
A fix is just a hypothesis until proven by metrics. "That should fix it" is not verification.
## When to Use
- After applying ANY fix to a production incident
- Before declaring an incident resolved
- When someone says "I think that fixed it"
## Verification Protocol
### 1. Define Observable Success Criteria
Before testing, explicitly state what we expect to see:
```
SUCCESS CRITERIA:
- [ ] Log entry: "[specific log message]"
- [ ] Metric change: [metric] goes from [X] to [Y]
- [ ] Database state: [field] = [expected value]
- [ ] API response: [endpoint] returns [expected response]
```
### 2. Trigger Test Event
```bash
# For webhook issues:
stripe events resend [event_id] --webhook-endpoint [endpoint_id]
# For API issues:
curl -X POST [endpoint] -d '[test payload]'
# For auth issues:
# Log in as test user, perform action
```
### 3. Observe Results
```bash
# Watch logs in real-time
vercel logs [app] --json | grep [pattern]
# Or for Convex:
npx convex logs --prod | grep [pattern]
# Check metrics
stripe events retrieve [event_id] | jq '.pending_webhooks'
```
### 4. Verify Database State
```bash
# Check the affected record
npx convex run --prod [query] '{"id": "[affected_id]"}'
```
### 5. Document Evidence
```
VERIFICATION EVIDENCE:
- Timestamp: [when]
- Test performed: [what we did]
- Log entry observed: [paste relevant log]
- Metric before: [value]
- Metric after: [value]
- Database state confirmed: [yes/no]
VERDICT: [VERIFIED / NOT VERIFIED]
```
## Red Flags (Fix NOT Verified)
- "The code looks right now"
- "The config is correct"
- "It should work"
- "Let's wait and see"
- No log entry observed
- Metrics unchanged
- Can't reproduce the original symptom
## Example: Webhook Fix Verification
```bash
# 1. Resend the failing event
stripe events resend evt_xxx --webhook-endpoint we_xxx
# 2. Watch logs (expect to see "Webhook received")
timeout 15 vercel logs app --json | grep webhook
# 3. Check delivery metric (expect decrease)
stripe events retrieve evt_xxx | jq '.pending_webhooks'
# Before: 4, After: 3 = DELIVERY SUCCEEDED
# 4. Check database state
npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'
# Expect: subscriptionStatus = "active"
# VERDICT: VERIFIED - all 4 checks passed
```
## If Verification Fails
1. **Don't panic** - the fix hypothesis was wrong, that's okay
2. **Revert** if the fix made things worse
3. **Loop back** to observation phase (OODA-V)
4. **Question assumptions** - what did we miss?
This skill enforces a mandatory, observable verification step after any production fix. It turns “I think that fixed it” into evidence-driven validation before declaring incidents resolved. The goal is to require logs, metrics, and database confirmation so fixes are proven, not assumed.
Define clear observable success criteria first (specific log messages, metric deltas, and database state). Trigger a controlled test event, observe logs and metrics in real time, query the database for the expected state, and record timestamped evidence. If all observables match the criteria, mark the incident VERIFIED; otherwise treat the fix as failed and iterate.
What counts as sufficient evidence?
Sufficient evidence is a timestamped log entry matching the expected message, a measurable metric change that returns the system toward normal, and database state confirming the intended write or flag change. At least one item from each category should be satisfied when applicable.
What if the original symptom cannot be reproduced?
If you cannot reproduce the symptom, do not mark the fix verified. Investigate why reproduction is failing, collect related telemetry, and consider staged rollbacks or feature flags while you continue diagnosis.