home / skills / jackspace / claudeskillz / session-timeout-handler

session-timeout-handler skill

/skills/session-timeout-handler

This skill helps you design timeout-resilient Claude Code workflows using chunking, checkpoints, and resume mechanics for reliable long-running tasks.

npx playbooks add skill jackspace/claudeskillz --skill session-timeout-handler

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
12.7 KB
---
name: session-timeout-handler
description: Build timeout-resistant Claude Code workflows with chunking strategies, checkpoint patterns, progress tracking, and resume mechanisms to handle 2-minute tool timeouts and ensure reliable completion of long-running operations.
license: MIT
tags: [timeout, resilience, workflow, chunking, checkpoints, claude-code]
---

# Session Timeout Handler

Build resilient workflows that gracefully handle Claude Code's 2-minute timeout constraints.

## Overview

Claude Code has a 120-second (2-minute) default tool timeout. This skill provides proven patterns for building workflows that work within this constraint while accomplishing complex, long-running tasks through intelligent chunking, checkpoints, and resumability.

## When to Use

Use this skill when:
- Processing 50+ items in a loop
- Running commands that take >60 seconds
- Downloading multiple large files
- Performing bulk API calls
- Processing large datasets
- Running database migrations
- Building large projects
- Cloning multiple repositories
- Any operation risking timeout

## Core Strategies

### Strategy 1: Batch Processing

```bash
#!/bin/bash
# Process items in small batches

ITEMS_FILE="items.txt"
BATCH_SIZE=10
OFFSET=${1:-0}

# Process one batch
tail -n +$((OFFSET + 1)) "$ITEMS_FILE" | head -$BATCH_SIZE | while read item; do
    process_item "$item"
    echo "✓ Processed: $item"
done

# Calculate next offset
NEXT_OFFSET=$((OFFSET + BATCH_SIZE))
TOTAL=$(wc -l < "$ITEMS_FILE")

if [ $NEXT_OFFSET -lt $TOTAL ]; then
    echo ""
    echo "Progress: $NEXT_OFFSET/$TOTAL"
    echo "Run: ./script.sh $NEXT_OFFSET"
else
    echo "✓ All items processed!"
fi
```

### Strategy 2: Checkpoint Resume

```bash
#!/bin/bash
# Checkpoint-based workflow

CHECKPOINT_FILE=".progress"
ITEMS=("item1" "item2" "item3" ... "item100")

# Load last checkpoint
LAST_COMPLETED=$(cat "$CHECKPOINT_FILE" 2>/dev/null || echo "-1")
START_INDEX=$((LAST_COMPLETED + 1))

# Process next batch (10 items)
END_INDEX=$((START_INDEX + 10))
[ $END_INDEX -gt ${#ITEMS[@]} ] && END_INDEX=${#ITEMS[@]}

for i in $(seq $START_INDEX $END_INDEX); do
    process "${ITEMS[$i]}"

    # Save checkpoint after each item
    echo "$i" > "$CHECKPOINT_FILE"
    echo "✓ Completed $i/${#ITEMS[@]}"
done

# Check if finished
if [ $END_INDEX -eq ${#ITEMS[@]} ]; then
    echo "🎉 All items complete!"
    rm "$CHECKPOINT_FILE"
else
    echo "📋 Run again to continue from item $((END_INDEX + 1))"
fi
```

### Strategy 3: State Machine

```bash
#!/bin/bash
# Multi-phase state machine workflow

STATE_FILE=".workflow_state"
STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")

case $STATE in
    INIT)
        echo "Phase 1: Initialization"
        setup_environment
        echo "DOWNLOAD" > "$STATE_FILE"
        echo "✓ Phase 1 complete. Run again for Phase 2."
        ;;

    DOWNLOAD)
        echo "Phase 2: Download (batch 1/5)"
        download_batch_1
        echo "PROCESS_1" > "$STATE_FILE"
        echo "✓ Phase 2 complete. Run again for Phase 3."
        ;;

    PROCESS_1)
        echo "Phase 3: Process (batch 1/3)"
        process_batch_1
        echo "PROCESS_2" > "$STATE_FILE"
        echo "✓ Phase 3 complete. Run again for Phase 4."
        ;;

    PROCESS_2)
        echo "Phase 4: Process (batch 2/3)"
        process_batch_2
        echo "PROCESS_3" > "$STATE_FILE"
        echo "✓ Phase 4 complete. Run again for Phase 5."
        ;;

    PROCESS_3)
        echo "Phase 5: Process (batch 3/3)"
        process_batch_3
        echo "FINALIZE" > "$STATE_FILE"
        echo "✓ Phase 5 complete. Run again to finalize."
        ;;

    FINALIZE)
        echo "Phase 6: Finalization"
        cleanup_and_report
        echo "COMPLETE" > "$STATE_FILE"
        echo "🎉 Workflow complete!"
        ;;

    COMPLETE)
        echo "Workflow already completed."
        cat results.txt
        ;;
esac
```

### Strategy 4: Progress Tracking

```bash
#!/bin/bash
# Detailed progress tracking with JSON

PROGRESS_FILE="progress.json"

# Initialize progress
init_progress() {
    cat > "$PROGRESS_FILE" << EOF
{
  "total": $1,
  "completed": 0,
  "failed": 0,
  "last_updated": "$(date -Iseconds)",
  "completed_items": []
}
EOF
}

# Update progress
update_progress() {
    local item="$1"
    local status="$2"  # "success" or "failed"

    # Read current progress
    local completed=$(jq -r '.completed' "$PROGRESS_FILE")
    local failed=$(jq -r '.failed' "$PROGRESS_FILE")

    if [ "$status" = "success" ]; then
        completed=$((completed + 1))
    else
        failed=$((failed + 1))
    fi

    # Update JSON
    jq --arg item "$item" \
       --arg status "$status" \
       --arg timestamp "$(date -Iseconds)" \
       --argjson completed "$completed" \
       --argjson failed "$failed" \
       '.completed = $completed |
        .failed = $failed |
        .last_updated = $timestamp |
        .completed_items += [{item: $item, status: $status, timestamp: $timestamp}]' \
       "$PROGRESS_FILE" > "$PROGRESS_FILE.tmp"

    mv "$PROGRESS_FILE.tmp" "$PROGRESS_FILE"
}

# Show progress
show_progress() {
    jq -r '"Progress: \(.completed)/\(.total) completed, \(.failed) failed"' "$PROGRESS_FILE"
}

# Example usage
init_progress 100

for item in $(seq 1 10); do  # Process 10 at a time
    if process_item "$item"; then
        update_progress "item-$item" "success"
    else
        update_progress "item-$item" "failed"
    fi
done

show_progress
```

## Timeout-Safe Patterns

### Pattern: Parallel with Rate Limiting

```bash
#!/bin/bash
# Process items in parallel with limits

MAX_PARALLEL=3
ACTIVE_JOBS=0

process_with_limit() {
    local item="$1"

    # Wait if at max parallel
    while [ $ACTIVE_JOBS -ge $MAX_PARALLEL ]; do
        wait -n  # Wait for any job to finish
        ACTIVE_JOBS=$((ACTIVE_JOBS - 1))
    done

    # Start background job
    (
        process_item "$item"
        echo "✓ $item"
    ) &

    ACTIVE_JOBS=$((ACTIVE_JOBS + 1))

    # Rate limiting
    sleep 0.5
}

# Process items
for item in $(seq 1 15); do
    process_with_limit "item-$item"
done

# Wait for remaining
wait
echo "✓ All processing complete"
```

### Pattern: Timeout with Fallback

```bash
#!/bin/bash
# Try with timeout, fallback to cached/partial result

OPERATION_TIMEOUT=90  # 90 seconds, leave buffer

if timeout ${OPERATION_TIMEOUT}s expensive_operation > result.txt 2>&1; then
    echo "✓ Operation completed"
    cat result.txt
else
    EXIT_CODE=$?

    if [ $EXIT_CODE -eq 124 ]; then
        echo "⚠️  Operation timed out after ${OPERATION_TIMEOUT}s"

        # Use cached result if available
        if [ -f "cached_result.txt" ]; then
            echo "Using cached result from $(stat -c %y cached_result.txt)"
            cat cached_result.txt
        else
            echo "Partial result:"
            cat result.txt
            echo ""
            echo "Run again to continue processing"
        fi
    else
        echo "✗ Operation failed with code $EXIT_CODE"
    fi
fi
```

### Pattern: Incremental Build

```bash
#!/bin/bash
# Incremental build with caching

BUILD_CACHE=".build_cache"
SOURCE_DIR="src"

# Track what needs rebuilding
mkdir -p "$BUILD_CACHE"

# Find changed files
find "$SOURCE_DIR" -type f -newer "$BUILD_CACHE/last_build" 2>/dev/null > changed_files.txt

if [ ! -s changed_files.txt ]; then
    echo "✓ No changes detected, using cached build"
    exit 0
fi

# Build only changed files (in batches)
head -10 changed_files.txt | while read file; do
    echo "Building: $file"
    build_file "$file"
done

# Update timestamp
touch "$BUILD_CACHE/last_build"

REMAINING=$(tail -n +11 changed_files.txt | wc -l)
if [ $REMAINING -gt 0 ]; then
    echo ""
    echo "⚠️  $REMAINING files remaining. Run again to continue."
else
    echo "✓ Build complete!"
fi
```

## Real-World Examples

### Example 1: Bulk Repository Cloning

```bash
#!/bin/bash
# Clone 50 repos without timing out

REPOS_FILE="repos.txt"
CHECKPOINT="cloned.txt"
BATCH_SIZE=5

# Skip already cloned
while read repo_url; do
    # Check checkpoint
    if grep -qF "$repo_url" "$CHECKPOINT" 2>/dev/null; then
        echo "✓ Skip: $repo_url (already cloned)"
        continue
    fi

    REPO_NAME=$(basename "$repo_url" .git)

    # Clone with timeout
    if timeout 60s git clone --depth 1 "$repo_url" "$REPO_NAME" 2>/dev/null; then
        echo "$repo_url" >> "$CHECKPOINT"
        echo "✓ Cloned: $REPO_NAME"
    else
        echo "✗ Failed: $REPO_NAME"
    fi
done < <(head -$BATCH_SIZE "$REPOS_FILE")

# Progress
CLONED=$(wc -l < "$CHECKPOINT" 2>/dev/null || echo 0)
TOTAL=$(wc -l < "$REPOS_FILE")
echo ""
echo "Progress: $CLONED/$TOTAL repositories"
[ $CLONED -lt $TOTAL ] && echo "Run again to continue"
```

### Example 2: Large Dataset Processing

```bash
#!/bin/bash
# Process 1000 data files in batches

DATA_DIR="data"
OUTPUT_DIR="processed"
BATCH_NUM=${1:-1}
BATCH_SIZE=20

mkdir -p "$OUTPUT_DIR"

# Calculate offset
OFFSET=$(( (BATCH_NUM - 1) * BATCH_SIZE ))

# Process batch
find "$DATA_DIR" -name "*.csv" | sort | tail -n +$((OFFSET + 1)) | head -$BATCH_SIZE | while read file; do
    FILENAME=$(basename "$file")
    OUTPUT="$OUTPUT_DIR/${FILENAME%.csv}.json"

    echo "Processing: $FILENAME"
    process_csv_to_json "$file" > "$OUTPUT"
    echo "✓ Created: $OUTPUT"
done

# Report progress
TOTAL=$(find "$DATA_DIR" -name "*.csv" | wc -l)
PROCESSED=$(find "$OUTPUT_DIR" -name "*.json" | wc -l)

echo ""
echo "Batch $BATCH_NUM complete"
echo "Progress: $PROCESSED/$TOTAL files"

if [ $PROCESSED -lt $TOTAL ]; then
    NEXT_BATCH=$((BATCH_NUM + 1))
    echo "Next: ./script.sh $NEXT_BATCH"
else
    echo "🎉 All files processed!"
fi
```

### Example 3: API Rate-Limited Scraping

```bash
#!/bin/bash
# Scrape API with rate limits and timeouts

API_IDS="ids.txt"
OUTPUT_DIR="api_data"
CHECKPOINT=".api_checkpoint"

mkdir -p "$OUTPUT_DIR"

# Load last checkpoint
LAST_ID=$(cat "$CHECKPOINT" 2>/dev/null || echo "0")

# Process next 10 IDs
grep -A 10 "^$LAST_ID$" "$API_IDS" | tail -n +2 | head -10 | while read id; do
    echo "Fetching ID: $id"

    # API call with timeout
    if timeout 30s curl -s "https://api.example.com/data/$id" > "$OUTPUT_DIR/$id.json"; then
        echo "$id" > "$CHECKPOINT"
        echo "✓ Saved: $id.json"
    else
        echo "✗ Failed: $id"
    fi

    # Rate limiting (100ms between requests)
    sleep 0.1
done

# Progress
TOTAL=$(wc -l < "$API_IDS")
COMPLETED=$(find "$OUTPUT_DIR" -name "*.json" | wc -l)

echo ""
echo "Progress: $COMPLETED/$TOTAL IDs"
[ $COMPLETED -lt $TOTAL ] && echo "Run again to continue"
```

## Integration with Claude Code

### Multi-Turn Workflow

Instead of one long operation, break into multiple Claude Code turns:

**Turn 1:**
```
Process first batch of 10 items and save checkpoint
```

**Turn 2:**
```
Resume from checkpoint and process next 10 items
```

**Turn 3:**
```
Complete processing and generate final report
```

### Status Check Pattern

```bash
# Turn 1: Start background job
./long_operation.sh > output.log 2>&1 &
echo $! > pid.txt
echo "Started background job (PID: $(cat pid.txt))"

# Turn 2: Check status
if ps -p $(cat pid.txt) > /dev/null 2>&1; then
    echo "Still running..."
    echo "Recent output:"
    tail -20 output.log
else
    echo "✓ Complete!"
    cat output.log
    rm pid.txt
fi
```

## Best Practices

### ✅ DO

1. **Chunk operations** into 5-10 item batches
2. **Save checkpoints** after each item or batch
3. **Use timeouts** on individual operations (60-90s max)
4. **Report progress** clearly and frequently
5. **Enable resume** from any checkpoint
6. **Parallelize smartly** (3-5 concurrent max)
7. **Track failures** separately from successes
8. **Clean up** checkpoint files when complete

### ❌ DON'T

1. **Don't process everything** in one go (>50 items)
2. **Don't skip checkpoints** on long workflows
3. **Don't ignore timeout signals** (handle EXIT 124)
4. **Don't block indefinitely** without progress
5. **Don't over-parallelize** (causes resource issues)
6. **Don't lose progress** due to missing state
7. **Don't retry infinitely** (limit to 3-4 attempts)
8. **Don't forget cleanup** of temp files

## Quick Reference

```bash
# Batch processing
tail -n +$OFFSET file.txt | head -$BATCH_SIZE | while read item; do process; done

# Checkpoint resume
LAST=$(cat .checkpoint || echo 0); process_from $LAST

# State machine
case $(cat .state) in PHASE1) step1 ;; PHASE2) step2 ;; esac

# Progress tracking
echo "Progress: $COMPLETED/$TOTAL ($((COMPLETED * 100 / TOTAL))%)"

# Timeout with fallback
timeout 90s operation || use_cached_result

# Parallel with limit
for i in items; do process &; jobs=$(jobs -r | wc -l); [ $jobs -ge $MAX ] && wait -n; done
```

---

**Version**: 1.0.0
**Author**: Harvested from timeout-prevention skill patterns
**Last Updated**: 2025-11-18
**License**: MIT
**Key Principle**: Never try to do everything at once – chunk, checkpoint, and resume.

Overview

This skill provides practical patterns to make Claude Code workflows resilient to the 120-second tool timeout. It teaches chunking, checkpointing, progress tracking, and resumable state-machine approaches so long-running jobs complete reliably. Use these patterns to convert fragile monolithic tasks into repeatable, timeout-safe steps.

How this skill works

The skill inspects workflow boundaries and breaks long tasks into small batches or phases. It stores lightweight state (checkpoints or state files) and updates progress JSON or log files after each unit of work so subsequent runs can resume where they left off. It also provides timeout wrappers, rate limits, parallel limits, and fallback behaviors to avoid lost progress when individual operations hit the 2-minute limit.

When to use it

  • Processing 50+ items in a loop or batch jobs
  • Running commands or downloads that risk exceeding 60–120 seconds
  • Performing bulk API calls or rate-limited scraping
  • Cloning many repositories or building large projects incrementally
  • Migrating databases, processing huge datasets, or any long-running pipeline

Best practices

  • Chunk operations into small batches (5–10 items) and limit batch duration
  • Save checkpoints after each item or batch so runs are idempotent and resumable
  • Apply explicit timeouts (60–90s) per operation and handle EXIT 124 gracefully
  • Track progress in a JSON file and separate successes from failures
  • Parallelize conservatively (3–5 concurrent jobs) and add rate limiting
  • Clean up checkpoints and temporary state on successful completion

Example use cases

  • Clone 50 repos in batches, checkpointing each successful clone to resume after timeouts
  • Process 1,000 CSV files in batch runs with offset-based resume and progress.json
  • Scrape a rate-limited API by resuming from an ID checkpoint and respecting timeouts
  • Run an incremental build that only compiles changed files in repeated short runs
  • Wrap expensive operations with timeout+fallback to cached or partial results

FAQ

How do I decide a safe batch size?

Choose a batch size that keeps per-run wall time well under 120 seconds; 5–10 items is a good starting point and adjust by measuring average item duration.

What if an operation times out mid-item?

Save partial outputs when possible and record the last successfully completed index in a checkpoint. On timeout, surface partial output or cached results and requeue the incomplete item on the next run.