home / skills / shotaiuchi / dotclaude / debug-concurrency

debug-concurrency skill

/dotclaude/skills/debug-concurrency

This skill helps investigate concurrency issues such as race conditions, deadlocks, and thread safety violations to improve reliability.

npx playbooks add skill shotaiuchi/dotclaude --skill debug-concurrency

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.0 KB
---
name: debug-concurrency
description: >-
  Concurrency and threading investigation. Apply when debugging race conditions,
  deadlocks, thread safety issues, shared mutable state, and timing-dependent
  failures.
user-invocable: false
---

# Concurrency Investigator Investigation

Investigate concurrency issues including race conditions, deadlocks, and thread safety violations.

## Investigation Checklist

### Race Condition Detection
- Identify shared mutable state accessed without synchronization
- Check for check-then-act patterns that allow interleaving
- Look for time-of-check to time-of-use (TOCTOU) vulnerabilities
- Verify atomic operations are used where required
- Detect read-modify-write sequences lacking proper guards

### Deadlock Analysis
- Map lock acquisition order across all code paths
- Identify circular wait conditions between threads or resources
- Check for nested lock acquisitions that invert ordering
- Verify timeout mechanisms exist for lock acquisition
- Look for resource starvation caused by unfair scheduling

### Thread Safety
- Verify collections and data structures are thread-safe or guarded
- Check that shared state is protected by consistent locking strategy
- Identify thread-local storage misuse or missing isolation
- Verify volatile/memory fence usage for visibility guarantees
- Check for safe publication of objects across thread boundaries

### Async/Await Correctness
- Verify async operations complete before dependent code executes
- Check for missing awaits that create fire-and-forget tasks
- Identify callback ordering assumptions that may not hold
- Verify cancellation tokens are checked and propagated
- Look for async void methods that swallow exceptions silently

## Output Format

Report findings with confidence ratings:

| Confidence | Description |
|------------|-------------|
| High | Root cause clearly identified with supporting evidence |
| Medium | Probable cause identified but needs verification |
| Low | Hypothesis formed but insufficient evidence |
| Inconclusive | Unable to determine from available information |

Overview

This skill helps investigate concurrency and threading problems like race conditions, deadlocks, and thread-safety violations. It provides a focused checklist and reporting style to surface root causes and hypotheses with confidence ratings. Use it to convert timing-dependent failures into actionable findings and remediation steps.

How this skill works

The skill inspects code paths for shared mutable state, lock ordering, and async/await misuse, and highlights suspicious patterns (check-then-act, missing awaits, nested locks). It maps potential circular waits, verifies guard usage for read-modify-write sequences, and assesses visibility guarantees (volatile/fences, safe publication). Findings are presented with a confidence level (High, Medium, Low, Inconclusive) and concrete evidence or reproduction notes where available.

When to use it

  • Intermittent crashes or data corruption suspected to be timing-dependent
  • Application hangs or severe throughput drops consistent with deadlocks
  • Reports of inconsistent state when accessed by multiple threads
  • After introducing new concurrency constructs or migrating to async/await
  • During code reviews of performance- or safety-sensitive multithreaded paths

Best practices

  • Identify and document all shared mutable state and enforce a single synchronization strategy
  • Establish and follow a global lock acquisition order to prevent circular waits
  • Prefer atomic operations or lock-free algorithms for small critical updates
  • Ensure all async operations are awaited or explicitly handled with cancellation
  • Use thread-safe collections or wrap access with consistent locking or immutability
  • Add timeouts and observability around lock acquisition to aid diagnosis

Example use cases

  • Analyze a server that intermittently returns stale or duplicated data under load to find race conditions
  • Trace a production hang to discover a deadlock caused by inconsistent lock ordering between modules
  • Audit a codebase after adding background tasks to catch missing awaits and swallowed exceptions
  • Validate thread-local usage and safe publication when sharing objects across worker threads
  • Assess a library for safe concurrent usage before shipping to downstream consumers

FAQ

How do you decide a High vs Medium confidence finding?

High confidence means a reproducible sequence or clear stack evidence links the bug to the concurrency pattern; Medium when the pattern is likely but requires a targeted repro or timing control to confirm.

Can this skill fix deadlocks automatically?

No. It identifies lock-order inversion and other causes and recommends deterministic fixes (reordering locks, timeouts, or refactoring) but code changes must be applied and validated manually.