home / skills / yousufjoyian / claude-skills / condition-based-waiting

condition-based-waiting skill

safe

This skill helps you replace arbitrary delays with condition-based waiting to stabilize async tests and reduce flakiness.

npx playbooks add skill yousufjoyian/claude-skills --skill condition-based-waiting

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

3.6 KB

---
name: condition-based-waiting
description: Replace arbitrary timeouts with condition polling for reliable async tests
---

# Condition-Based Waiting

## Overview

Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.

**Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.

## When to Use

```dot
digraph when_to_use {
    "Test uses setTimeout/sleep?" [shape=diamond];
    "Testing timing behavior?" [shape=diamond];
    "Document WHY timeout needed" [shape=box];
    "Use condition-based waiting" [shape=box];

    "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
    "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
    "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
}
```

**Use when:**
- Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
- Tests are flaky (pass sometimes, fail under load)
- Tests timeout when run in parallel
- Waiting for async operations to complete

**Don't use when:**
- Testing actual timing behavior (debounce, throttle intervals)
- Always document WHY if using arbitrary timeout

## Core Pattern

```typescript
// ❌ BEFORE: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();
expect(result).toBeDefined();

// ✅ AFTER: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();
expect(result).toBeDefined();
```

## Quick Patterns

| Scenario | Pattern |
|----------|---------|
| Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
| Wait for state | `waitFor(() => machine.state === 'ready')` |
| Wait for count | `waitFor(() => items.length >= 5)` |
| Wait for file | `waitFor(() => fs.existsSync(path))` |
| Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |

## Implementation

Generic polling function:
```typescript
async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
  }
}
```

See @example.ts for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.

## Common Mistakes

**❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
**✅ Fix:** Poll every 10ms

**❌ No timeout:** Loop forever if condition never met
**✅ Fix:** Always include timeout with clear error

**❌ Stale data:** Cache state before loop
**✅ Fix:** Call getter inside loop for fresh data

## When Arbitrary Timeout IS Correct

```typescript
// Tool ticks every 100ms - need 2 ticks to verify partial output
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200));   // Then: wait for timed behavior
// 200ms = 2 ticks at 100ms intervals - documented and justified
```

**Requirements:**
1. First wait for triggering condition
2. Based on known timing (not guessing)
3. Comment explaining WHY

## Real-World Impact

From debugging session (2025-10-03):
- Fixed 15 flaky tests across 3 files
- Pass rate: 60% → 100%
- Execution time: 40% faster
- No more race conditions

Overview

This skill replaces arbitrary timeouts in tests with condition-based polling to make asynchronous tests reliable and faster. It provides a small polling utility and domain helpers so tests wait for actual conditions instead of guessing delays. The result is fewer flaky tests and clearer failure messages when timeouts occur.

How this skill works

The skill exposes a waitFor-style polling function that repeatedly calls a provided getter until it returns a truthy value or a timeout elapses. It accepts a short polling interval and a descriptive message used in timeout errors. Domain helpers wrap common checks (events, state, counts, files) so tests express intent instead of timing. Timeouts, polling cadence, and fresh reads inside the loop prevent busy-waiting, infinite loops, and stale data.

When to use it

Tests that use arbitrary sleeps or setTimeout calls
Tests that are flaky under load or in CI
Waiting for async operations, events, or external state to appear
When parallel test runs cause race conditions
When you need clearer failure diagnostics for missing conditions

Best practices

Call the getter inside the polling loop so you always read fresh state
Use a moderate poll interval (e.g., ~10ms) to balance CPU and responsiveness
Always include a timeout and a clear descriptive message for failures
Document and justify any remaining fixed delays (only when measuring time-based behavior)
Prefer specialized helpers (waitForEvent, waitForCount) for readable tests

Example use cases

Wait for an event: waitFor(() => events.find(e => e.type === 'DONE'))
Wait for service state: waitFor(() => machine.state === 'ready')
Wait for a number of items: waitFor(() => items.length >= 5)
Wait for a file to exist: waitFor(() => fs.existsSync(path))
Combine conditions: waitFor(() => obj.ready && obj.value > 10)

FAQ

What polling interval should I use?

Use a short, non-aggressive interval like 10ms. It reduces latency without wasting CPU. Adjust only for resource-constrained environments.

What if I really need a sleep?

Only use fixed sleeps when testing precise timing behavior. First wait for the triggering condition, then use a documented, justified delay based on known ticks or intervals.