home / skills / jackspace / claudeskillz / condition-based-waiting_obra

condition-based-waiting_obra skill

/skills/condition-based-waiting_obra

This skill replaces arbitrary delays with condition polling to eliminate flaky tests and ensure reliable state-based waits.

This is most likely a fork of the condition-based-waiting skill from microck
npx playbooks add skill jackspace/claudeskillz --skill condition-based-waiting_obra

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
3.7 KB
---
name: condition-based-waiting
description: Use when tests have race conditions, timing dependencies, or inconsistent pass/fail behavior - replaces arbitrary timeouts with condition polling to wait for actual state changes, eliminating flaky tests from timing guesses
---

# Condition-Based Waiting

## Overview

Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.

**Core principle:** Wait for the actual condition you care about, not a guess about how long it takes.

## When to Use

```dot
digraph when_to_use {
    "Test uses setTimeout/sleep?" [shape=diamond];
    "Testing timing behavior?" [shape=diamond];
    "Document WHY timeout needed" [shape=box];
    "Use condition-based waiting" [shape=box];

    "Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
    "Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
    "Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
}
```

**Use when:**
- Tests have arbitrary delays (`setTimeout`, `sleep`, `time.sleep()`)
- Tests are flaky (pass sometimes, fail under load)
- Tests timeout when run in parallel
- Waiting for async operations to complete

**Don't use when:**
- Testing actual timing behavior (debounce, throttle intervals)
- Always document WHY if using arbitrary timeout

## Core Pattern

```typescript
// ❌ BEFORE: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();
expect(result).toBeDefined();

// ✅ AFTER: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();
expect(result).toBeDefined();
```

## Quick Patterns

| Scenario | Pattern |
|----------|---------|
| Wait for event | `waitFor(() => events.find(e => e.type === 'DONE'))` |
| Wait for state | `waitFor(() => machine.state === 'ready')` |
| Wait for count | `waitFor(() => items.length >= 5)` |
| Wait for file | `waitFor(() => fs.existsSync(path))` |
| Complex condition | `waitFor(() => obj.ready && obj.value > 10)` |

## Implementation

Generic polling function:
```typescript
async function waitFor<T>(
  condition: () => T | undefined | null | false,
  description: string,
  timeoutMs = 5000
): Promise<T> {
  const startTime = Date.now();

  while (true) {
    const result = condition();
    if (result) return result;

    if (Date.now() - startTime > timeoutMs) {
      throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
    }

    await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
  }
}
```

See @example.ts for complete implementation with domain-specific helpers (`waitForEvent`, `waitForEventCount`, `waitForEventMatch`) from actual debugging session.

## Common Mistakes

**❌ Polling too fast:** `setTimeout(check, 1)` - wastes CPU
**✅ Fix:** Poll every 10ms

**❌ No timeout:** Loop forever if condition never met
**✅ Fix:** Always include timeout with clear error

**❌ Stale data:** Cache state before loop
**✅ Fix:** Call getter inside loop for fresh data

## When Arbitrary Timeout IS Correct

```typescript
// Tool ticks every 100ms - need 2 ticks to verify partial output
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200));   // Then: wait for timed behavior
// 200ms = 2 ticks at 100ms intervals - documented and justified
```

**Requirements:**
1. First wait for triggering condition
2. Based on known timing (not guessing)
3. Comment explaining WHY

## Real-World Impact

From debugging session (2025-10-03):
- Fixed 15 flaky tests across 3 files
- Pass rate: 60% → 100%
- Execution time: 40% faster
- No more race conditions

Overview

This skill replaces arbitrary timeouts in tests with condition-based polling to eliminate race conditions and flaky behavior. It helps tests wait for the actual state or event they depend on rather than guessing how long operations take. The result is more reliable, faster tests and clearer failure messages when conditions are not met.

How this skill works

The core is a generic waitFor polling function that repeatedly evaluates a condition until it becomes truthy or a timeout elapses. It calls the condition getter inside the loop to avoid stale data, uses a modest polling interval (e.g., 10ms) to balance responsiveness and CPU use, and throws a descriptive error when the timeout is reached. Domain helpers (waitForEvent, waitForCount, waitForFile) wrap waitFor with common predicates.

When to use it

  • Tests that use arbitrary delays or sleeps (setTimeout, time.sleep, sleep)
  • Tests that are flaky—pass locally but fail under load or in CI
  • Waiting for async operations, background tasks, or external resources to reach a desired state
  • When tests fail intermittently when run in parallel or under different machine speeds
  • Before resorting to long hard-coded timeouts; use for most timing-related waits

Best practices

  • Always wait for the triggering condition first, then use documented time-based waits only when hardware/ticks require it
  • Include a reasonable timeout and a clear error message describing what was expected
  • Poll at a moderate interval (around 10ms) to avoid wasting CPU
  • Call the state getter inside the loop to avoid stale cached values
  • Document why any remaining hard-coded timeout is necessary (ticks, external rate limits)

Example use cases

  • Wait for an async background job to create a record: waitFor(() => findRecord(id))
  • Wait for a state machine to become ready: waitFor(() => machine.state === 'ready')
  • Ensure a file appears on disk after processing: waitFor(() => fs.existsSync(path))
  • Wait for a minimum count of events: waitFor(() => events.length >= 5)
  • Combine conditions: waitFor(() => obj.ready && obj.value > 10)

FAQ

How long should the timeout be?

Choose a timeout based on expected worst-case latencies for the environment plus a margin; common defaults are 3–10 seconds for unit tests and higher for integration tests.

What polling interval is recommended?

Use a modest interval like 10ms to balance responsiveness with CPU load; avoid aggressively tight loops (1ms) and avoid very large intervals that slow tests.