home / skills / jeremylongshore / claude-code-plugins-plus-skills / exa-advanced-troubleshooting

This skill helps you diagnose hard-to-debug Exa issues by guiding evidence collection, layering tests, and preparing escalation-ready bundles.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill exa-advanced-troubleshooting

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.6 KB
---
name: exa-advanced-troubleshooting
description: |
  Apply Exa advanced debugging techniques for hard-to-diagnose issues.
  Use when standard troubleshooting fails, investigating complex race conditions,
  or preparing evidence bundles for Exa support escalation.
  Trigger with phrases like "exa hard bug", "exa mystery error",
  "exa impossible to debug", "difficult exa issue", "exa deep debug".
allowed-tools: Read, Grep, Bash(kubectl:*), Bash(curl:*), Bash(tcpdump:*)
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Exa Advanced Troubleshooting

## Overview
Deep debugging techniques for complex Exa issues that resist standard troubleshooting.

## Prerequisites
- Access to production logs and metrics
- kubectl access to clusters
- Network capture tools available
- Understanding of distributed tracing

## Evidence Collection Framework

### Comprehensive Debug Bundle
```bash
#!/bin/bash
# advanced-exa-debug.sh

BUNDLE="exa-advanced-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"/{logs,metrics,network,config,traces}

# 1. Extended logs (1 hour window)
kubectl logs -l app=exa-integration --since=1h > "$BUNDLE/logs/pods.log"
journalctl -u exa-service --since "1 hour ago" > "$BUNDLE/logs/system.log"

# 2. Metrics dump
curl -s localhost:9090/api/v1/query?query=exa_requests_total > "$BUNDLE/metrics/requests.json"
curl -s localhost:9090/api/v1/query?query=exa_errors_total > "$BUNDLE/metrics/errors.json"

# 3. Network capture (30 seconds)
timeout 30 tcpdump -i any port 443 -w "$BUNDLE/network/capture.pcap" &

# 4. Distributed traces
curl -s localhost:16686/api/traces?service=exa > "$BUNDLE/traces/jaeger.json"

# 5. Configuration state
kubectl get cm exa-config -o yaml > "$BUNDLE/config/configmap.yaml"
kubectl get secret exa-secrets -o yaml > "$BUNDLE/config/secrets-redacted.yaml"

tar -czf "$BUNDLE.tar.gz" "$BUNDLE"
echo "Advanced debug bundle: $BUNDLE.tar.gz"
```

## Systematic Isolation

### Layer-by-Layer Testing

```typescript
// Test each layer independently
async function diagnoseExaIssue(): Promise<DiagnosisReport> {
  const results: DiagnosisResult[] = [];

  // Layer 1: Network connectivity
  results.push(await testNetworkConnectivity());

  // Layer 2: DNS resolution
  results.push(await testDNSResolution('api.exa.com'));

  // Layer 3: TLS handshake
  results.push(await testTLSHandshake('api.exa.com'));

  // Layer 4: Authentication
  results.push(await testAuthentication());

  // Layer 5: API response
  results.push(await testAPIResponse());

  // Layer 6: Response parsing
  results.push(await testResponseParsing());

  return { results, firstFailure: results.find(r => !r.success) };
}
```

### Minimal Reproduction

```typescript
// Strip down to absolute minimum
async function minimalRepro(): Promise<void> {
  // 1. Fresh client, no customization
  const client = new ExaClient({
    apiKey: process.env.EXA_API_KEY!,
  });

  // 2. Simplest possible call
  try {
    const result = await client.ping();
    console.log('Ping successful:', result);
  } catch (error) {
    console.error('Ping failed:', {
      message: error.message,
      code: error.code,
      stack: error.stack,
    });
  }
}
```

## Timing Analysis

```typescript
class TimingAnalyzer {
  private timings: Map<string, number[]> = new Map();

  async measure<T>(label: string, fn: () => Promise<T>): Promise<T> {
    const start = performance.now();
    try {
      return await fn();
    } finally {
      const duration = performance.now() - start;
      const existing = this.timings.get(label) || [];
      existing.push(duration);
      this.timings.set(label, existing);
    }
  }

  report(): TimingReport {
    const report: TimingReport = {};
    for (const [label, times] of this.timings) {
      report[label] = {
        count: times.length,
        min: Math.min(...times),
        max: Math.max(...times),
        avg: times.reduce((a, b) => a + b, 0) / times.length,
        p95: this.percentile(times, 95),
      };
    }
    return report;
  }
}
```

## Memory and Resource Analysis

```typescript
// Detect memory leaks in Exa client usage
const heapUsed: number[] = [];

setInterval(() => {
  const usage = process.memoryUsage();
  heapUsed.push(usage.heapUsed);

  // Alert on sustained growth
  if (heapUsed.length > 60) { // 1 hour at 1/min
    const trend = heapUsed[59] - heapUsed[0];
    if (trend > 100 * 1024 * 1024) { // 100MB growth
      console.warn('Potential memory leak in exa integration');
    }
  }
}, 60000);
```

## Race Condition Detection

```typescript
// Detect concurrent access issues
class ExaConcurrencyChecker {
  private inProgress: Set<string> = new Set();

  async execute<T>(key: string, fn: () => Promise<T>): Promise<T> {
    if (this.inProgress.has(key)) {
      console.warn(`Concurrent access detected for ${key}`);
    }

    this.inProgress.add(key);
    try {
      return await fn();
    } finally {
      this.inProgress.delete(key);
    }
  }
}
```

## Support Escalation Template

```markdown
## Exa Support Escalation

**Severity:** P[1-4]
**Request ID:** [from error response]
**Timestamp:** [ISO 8601]

### Issue Summary
[One paragraph description]

### Steps to Reproduce
1. [Step 1]
2. [Step 2]

### Expected vs Actual
- Expected: [behavior]
- Actual: [behavior]

### Evidence Attached
- [ ] Debug bundle (exa-advanced-debug-*.tar.gz)
- [ ] Minimal reproduction code
- [ ] Timing analysis
- [ ] Network capture (if relevant)

### Workarounds Attempted
1. [Workaround 1] - Result: [outcome]
2. [Workaround 2] - Result: [outcome]
```

## Instructions

### Step 1: Collect Evidence Bundle
Run the comprehensive debug script to gather all relevant data.

### Step 2: Systematic Isolation
Test each layer independently to identify the failure point.

### Step 3: Create Minimal Reproduction
Strip down to the simplest failing case.

### Step 4: Escalate with Evidence
Use the support template with all collected evidence.

## Output
- Comprehensive debug bundle collected
- Failure layer identified
- Minimal reproduction created
- Support escalation submitted

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Can't reproduce | Race condition | Add timing analysis |
| Intermittent failure | Timing-dependent | Increase sample size |
| No useful logs | Missing instrumentation | Add debug logging |
| Memory growth | Resource leak | Use heap profiling |

## Examples

### Quick Layer Test
```bash
# Test each layer in sequence
curl -v https://api.exa.com/health 2>&1 | grep -E "(Connected|TLS|HTTP)"
```

## Resources
- [Exa Support Portal](https://support.exa.com)
- [Exa Status Page](https://status.exa.com)

## Next Steps
For load testing, see `exa-load-scale`.

Overview

This skill applies advanced Exa debugging techniques for hard-to-diagnose issues that resist standard troubleshooting. It guides you through evidence collection, layer-by-layer isolation, timing and memory analysis, and preparing a support-ready escalation bundle. Use it to reduce time-to-root-cause for intermittent, racey, or environment-specific failures.

How this skill works

The skill provides a repeatable evidence collection framework that produces a comprehensive debug bundle with logs, metrics, network captures, traces, and configuration state. It also offers systematic isolation tests for each layer (network, DNS, TLS, auth, API, parsing), minimal reproduction patterns, timing and memory analyzers, and concurrency detectors to surface race conditions. Finally, it formats a support escalation template with the artifacts needed by Exa support.

When to use it

  • When standard troubleshooting steps do not reveal the root cause
  • Investigating intermittent or timing-dependent failures and race conditions
  • Preparing a complete evidence package for Exa support escalation
  • Validating suspected memory leaks or resource growth in the Exa integration
  • During post-incident forensic analysis to collect reproducible artifacts

Best practices

  • Run the comprehensive debug script from a host with cluster and network access
  • Collect logs and metrics over a meaningful window (minutes to hours) for intermittent issues
  • Isolate each layer independently to identify the first failing component
  • Create a minimal reproduction with a fresh client and the simplest API call
  • Include timing reports, heap trends, and packet captures before escalating

Example use cases

  • Intermittent API timeouts that only appear under load—use timing analyzer and network capture
  • Race conditions when multiple workers access shared keys—use the concurrency checker
  • Memory growth in long-running processes—collect heap usage trends and heap profiles
  • Authentication failures that vary by environment—run layer-by-layer tests including TLS and DNS
  • Preparing an escalation when an error cannot be reproduced locally—attach the debug bundle and minimal repro

FAQ

What access do I need to run the advanced debug bundle?

You need kubectl access to the cluster, access to production logs and metrics, and ability to run network capture tools on a host that sees the traffic.

How do I handle sensitive data in the bundle?

Redact secrets before packaging (the framework exports secrets in redacted form) and follow your organization’s data-handling policy when sharing with support.