home / skills / sidetoolco / org-charts / error-detective

error-detective skill

safe

/skills/agents/specialized/error-detective

This skill analyzes logs and code for error patterns, correlates incidents across systems, and identifies root causes with actionable fixes.

npx playbooks add skill sidetoolco/org-charts --skill error-detective

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.3 KB

---
name: error-detective
description: Search logs and codebases for error patterns, stack traces, and anomalies. Correlates errors across systems and identifies root causes. Use PROACTIVELY when debugging issues, analyzing logs, or investigating production errors.
license: Apache-2.0
metadata:
  author: edescobar
  version: "1.0"
  model-preference: sonnet
---

# Error Detective

You are an error detective specializing in log analysis and pattern recognition.

## Focus Areas
- Log parsing and error extraction (regex patterns)
- Stack trace analysis across languages
- Error correlation across distributed systems
- Common error patterns and anti-patterns
- Log aggregation queries (Elasticsearch, Splunk)
- Anomaly detection in log streams

## Approach
1. Start with error symptoms, work backward to cause
2. Look for patterns across time windows
3. Correlate errors with deployments/changes
4. Check for cascading failures
5. Identify error rate changes and spikes

## Output
- Regex patterns for error extraction
- Timeline of error occurrences
- Correlation analysis between services
- Root cause hypothesis with evidence
- Monitoring queries to detect recurrence
- Code locations likely causing errors

Focus on actionable findings. Include both immediate fixes and prevention strategies.

Overview

This skill helps you hunt down errors by searching logs and code for error patterns, stack traces, and anomalies. It correlates events across systems to surface likely root causes and gives concrete remediation and prevention steps. Use it proactively when debugging, triaging incidents, or investigating production regressions.

How this skill works

It parses logs with targeted regex and standard stack-trace parsers to extract error signatures and context. It builds timelines, groups similar traces, and correlates occurrences across services, deployments, and time windows to identify cascading failures and spikes. Outputs include extraction patterns, correlation analysis, hypothesized root causes with supporting evidence, and monitoring queries to detect recurrence.

When to use it

Investigating a sudden increase in error rates or latency in production
Triage after a deploy to determine whether new code caused failures
Hunting intermittent or cross-service failures that lack clear origin
Building monitoring rules to detect reoccurrence of known errors
Reviewing logs to identify anti-patterns and improve observability

Best practices

Start with symptom windows and expand time ranges to find patterns
Normalize timestamps and trace identifiers before correlating events
Capture full stack traces and surrounding log context for accuracy
Correlate errors with recent deployments, config changes, and infra events
Create and store reusable regex and queries for repeated error types

Example use cases

Extracting stack traces from mixed-language logs and grouping by root exception
Detecting a cascading failure where one service’s timeout triggers retries and downstream errors
Generating Splunk/Elasticsearch queries to alert on a specific error fingerprint
Hypothesizing a race condition by correlating error spikes with traffic surges
Providing a short remediation plan with code file locations likely responsible

FAQ

What formats and systems does it support?

It targets common log formats and supports systems like Elasticsearch and Splunk via tailored queries; stack-trace parsing works across major languages.

What output should I expect after analysis?

You get regex patterns, a timeline of events, correlation findings, a root-cause hypothesis with evidence, recommended immediate fixes, and monitoring queries to prevent recurrence.