home / skills / troykelly / claude-skills / autonomous-orchestration
autonomous-orchestration skill

not checked
npx playbooks add skill troykelly/claude-skills --skill autonomous-orchestration
Review the files below or copy the command above to add this skill to your agents.
Files (4)
SKILL.md
15.3 KB
---
name: autonomous-orchestration
description: Use when user requests autonomous operation across multiple issues. Orchestrates parallel workers using Task tool, monitors with TaskOutput, handles SLEEP/WAKE cycles, and works until scope is complete without user intervention.
allowed-tools:
  - Bash
  - Read
  - Grep
  - Glob
  - Task
  - TaskOutput
  - mcp__github__*
  - mcp__memory__*
model: opus
---

# Autonomous Orchestration

## Overview

Orchestrates long-running autonomous work across multiple issues using the **Task tool** to spawn parallel worker agents and **TaskOutput** to monitor their progress.

**Core principle:** GitHub is the source of truth. Workers are disposable. State survives restarts.

**Announce at start:** "I'm using autonomous-orchestration to work through [SCOPE]. Starting autonomous operation now."

## Prerequisites

- `worker-dispatch` skill for spawning workers with Task tool
- `worker-protocol` skill for worker behavior
- `ci-monitoring` skill for CI/WAKE handling
- GitHub CLI (`gh`) authenticated
- GitHub Project Board configured

## Parallel Execution Model

Workers are spawned as **background agents** using the Task tool:

```
Orchestrator
    │
    ├── Task(run_in_background: true) → Worker Agent #1 (task_id: aa93f22)
    ├── Task(run_in_background: true) → Worker Agent #2 (task_id: b51e54b)
    └── Task(run_in_background: true) → Worker Agent #3 (task_id: c72f3d1)
```

Monitor progress with TaskOutput:

```
TaskOutput(task_id: "aa93f22", block: false) → "Task is still running..."
TaskOutput(task_id: "b51e54b", block: false) → Completed with result
TaskOutput(task_id: "c72f3d1", block: false) → "Task is still running..."
```

**CRITICAL:** Spawn all workers in the SAME message for true concurrent execution.

## State Management

**CRITICAL:** All persistent state is stored in GitHub. Task IDs are ephemeral session state.

| State Store | Purpose | Used For |
|-------------|---------|----------|
| Project Board Status | THE source of truth | Ready, In Progress, In Review, Blocked, Done |
| Issue Comments | Activity log | Worker assignment, progress, deviations |
| Labels | Lineage only | `spawned-from:#N`, `depth:N`, `epic-*` |
| MCP Memory | Active marker + task tracking | **Active orchestration detection**, active task_ids |
| Orchestrator memory | Ephemeral session state | Map of issue# → task_id for monitoring |

**See:** `reference/state-management.md` for detailed state queries and updates.

## Context Compaction Survival

**CRITICAL:** Orchestration must survive mid-loop context compaction.

### On Start: Write Active Marker

```bash
# Write to MCP Memory when orchestration starts
mcp__memory__create_entities([{
  "name": "ActiveOrchestration",
  "entityType": "Orchestration",
  "observations": [
    "Status: ACTIVE",
    "Scope: [MILESTONE/EPIC/unbounded]",
    "Tracking Issue: #[NUMBER]",
    "Started: [ISO_TIMESTAMP]",
    "Repository: [owner/repo]",
    "Phase: BOOTSTRAP|MAIN_LOOP",
    "Last Loop: [ISO_TIMESTAMP]"
  ]
}])
```

### On Each Loop Iteration: Update Marker

```bash
mcp__memory__add_observations({
  "observations": [{
    "entityName": "ActiveOrchestration",
    "contents": ["Last Loop: [ISO_TIMESTAMP]", "Phase: MAIN_LOOP"]
  }]
})
```

### On Complete: Remove Marker

```bash
mcp__memory__delete_entities({
  "entityNames": ["ActiveOrchestration"]
})
```

### On Session Resume (After Compaction)

Session-start skill checks for active orchestration:

```bash
# Check MCP Memory for active orchestration
ACTIVE=$(mcp__memory__open_nodes({"names": ["ActiveOrchestration"]}))

if [ -n "$ACTIVE" ]; then
  echo "⚠️ ACTIVE ORCHESTRATION DETECTED"
  echo "Scope: [from ACTIVE]"
  echo "Tracking: [from ACTIVE]"
  echo ""
  echo "Resuming orchestration loop..."
  # Invoke autonomous-orchestration skill to resume
fi
```

**This ensures:** Even if context compacts mid-loop, the next session will detect the active orchestration and resume it.

## Immediate Start (User Consent Implied)

**The user's request for autonomous operation IS their consent.** No additional confirmation required.

When the user requests autonomous work:

1. **Identify scope** - Parse user request for milestone, epic, specific issues, or "all"
2. **Announce intent** - Briefly state what you're about to do
3. **Start immediately** - Begin orchestration without waiting for additional input

```markdown
## Starting Autonomous Operation

**Scope:** [MILESTONE/EPIC/ISSUES or "all open issues"]
**Workers:** Up to 5 parallel
**Mode:** Continuous until complete

Beginning work now...
```

**Do NOT ask for "PROCEED" or any confirmation.** The user asked for autonomous operation - that is the confirmation.

## Automatic Scope Detection

When the user requests autonomous operation without specifying a scope:

### Priority Order

1. **User-specified scope** - If user mentions specific issues, epics, or milestones
2. **Urgent/High Priority standalone issues** - Issues with `priority:urgent` or `priority:high` labels not part of an epic
3. **Epic-based sequential work** - Work through epics in order, completing all issues within each epic
4. **Remaining standalone issues** - Any issues not part of an epic

```bash
detect_work_scope() {
  # 1. Check for urgent/high priority standalone issues first
  PRIORITY_ISSUES=$(gh issue list --state open \
    --label "priority:urgent,priority:high" \
    --json number,labels \
    --jq '[.[] | select(.labels | map(.name) | any(startswith("epic-")) | not)] | .[].number')

  if [ -n "$PRIORITY_ISSUES" ]; then
    echo "priority_standalone"
    echo "$PRIORITY_ISSUES"
    return
  fi

  # 2. Get epics in order (by creation date)
  EPICS=$(gh issue list --state open --label "type:epic" \
    --json number,title,createdAt \
    --jq 'sort_by(.createdAt) | .[].number')

  if [ -n "$EPICS" ]; then
    echo "epics"
    echo "$EPICS"
    return
  fi

  # 3. Fall back to all open issues
  ALL_ISSUES=$(gh issue list --state open --json number --jq '.[].number')
  echo "all_issues"
  echo "$ALL_ISSUES"
}
```

## Continuous Operation Until Complete

Autonomous operation continues until ALL of:
- No open issues remain in scope
- No open PRs awaiting merge
- No issues in "In Progress" or "In Review" status

The operation does NOT pause for:
- Progress updates
- Confirmation between issues
- Switching between epics
- Any user input (unless blocked by a fatal error)

## PR Resolution Bootstrap Phase

**CRITICAL:** Before spawning ANY new workers, resolve all existing open PRs first.

**Bootstrap Flow:** Get open PRs (exclude release/*, release-placeholder, do-not-merge) → For each: Check CI → Verify review → Merge if ready → Then start main loop

### Bootstrap Implementation

```bash
resolve_existing_prs() {
  echo "=== PR RESOLUTION BOOTSTRAP ==="

  # Get all open PRs, excluding release placeholders
  OPEN_PRS=$(gh pr list --json number,headRefName,labels \
    --jq '[.[] | select(
      (.headRefName | startswith("release/") | not) and
      (.labels | map(.name) | index("release-placeholder") | not)
    )] | .[].number')

  if [ -z "$OPEN_PRS" ]; then
    echo "No actionable PRs to resolve. Proceeding to main loop."
    return 0
  fi

  echo "Found PRs to resolve: $OPEN_PRS"

  for pr in $OPEN_PRS; do
    echo "Processing PR #$pr..."

    # Get CI status
    ci_status=$(gh pr checks "$pr" --json state --jq '.[].state' 2>/dev/null | sort -u)

    # Get linked issue
    ISSUE=$(gh pr view "$pr" --json body --jq '.body' | grep -oE 'Closes #[0-9]+' | grep -oE '[0-9]+' | head -1)

    if [ -z "$ISSUE" ]; then
      echo "  ⚠ No linked issue found, skipping"
      continue
    fi

    # Check if CI passed
    if echo "$ci_status" | grep -q "FAILURE"; then
      echo "  ❌ CI failing - triggering ci-monitoring for PR #$pr"
      # Invoke ci-monitoring skill to fix
      handle_ci_failure "$pr"
      continue
    fi

    if echo "$ci_status" | grep -q "PENDING"; then
      echo "  ⏳ CI pending for PR #$pr, will check in main loop"
      continue
    fi

    if echo "$ci_status" | grep -q "SUCCESS"; then
      # Verify review artifact
      REVIEW_EXISTS=$(gh api "/repos/$OWNER/$REPO/issues/$ISSUE/comments" \
        --jq '[.[] | select(.body | contains("<!-- REVIEW:START -->"))] | length' 2>/dev/null || echo "0")

      if [ "$REVIEW_EXISTS" = "0" ]; then
        echo "  ⚠ No review artifact - requesting review for #$ISSUE"
        gh issue comment "$ISSUE" --body "## Review Required

PR #$pr has passing CI but no review artifact.

**Action needed:** Complete comprehensive-review and post artifact to this issue.

---
*Bootstrap phase - Orchestrator*"
        continue
      fi

      # All checks pass - merge
      echo "  ✅ Merging PR #$pr"
      gh pr merge "$pr" --squash --delete-branch
      mark_issue_done "$ISSUE"
    fi
  done

  echo "=== BOOTSTRAP COMPLETE ==="
}
```

### Release Placeholder Detection

PRs are excluded from bootstrap resolution if:

| Condition | Example |
|-----------|---------|
| Branch starts with `release/` | `release/v2.0.0`, `release/2025-01` |
| Has `release-placeholder` label | Manual exclusion |
| Has `do-not-merge` label | Explicit hold |

## Orchestration Loop

**Main Loop Flow:** Monitor workers (TaskOutput) + Check CI/PRs + Spawn workers (Task) → Evaluate state (complete/sleep/continue)

### Loop Steps

1. **Monitor Active Workers** - Use `TaskOutput(block: false)` for each active task_id
2. **Handle Completed Workers** - Check GitHub for PR/completion status
3. **Check CI/PRs** - Monitor for merge readiness, verify review artifacts
4. **MERGE GREEN PRs** - Any PR with passing CI is merged IMMEDIATELY
5. **Spawn Workers** - Up to 5 parallel workers using `Task(run_in_background: true)`
6. **Evaluate State** - Determine next action (continue, sleep, complete)
7. **Brief Pause** - 30 second interval between iterations

### Worker Monitoring with TaskOutput

Each loop iteration checks all active workers:

```markdown
## Monitor Active Workers

For each task_id in active_workers:
  TaskOutput(task_id: "[ID]", block: false, timeout: 1000)

Interpret results:
- "Task is still running..." → Continue monitoring
- Completed → Check GitHub for PR, update project board, remove from active list
- Error → Handle failure, potentially spawn replacement
```

### Spawning Parallel Workers

Spawn multiple workers in ONE message for concurrent execution:

```markdown
## Dispatch Workers

1. Count available slots: 5 - len(active_workers)
2. Get Ready issues from project board
3. For each issue to dispatch:

Task(
  description: "Issue #123 worker",
  prompt: [WORKER_PROMPT],
  subagent_type: "general-purpose",
  run_in_background: true
)

4. Store returned task_id → issue mapping
```

### CRITICAL: Merge Green PRs Immediately

**Every loop iteration must check for and merge passing PRs:**

```bash
# In each loop iteration
for pr in $(gh pr list --json number,statusCheckRollup --jq '.[] | select(.statusCheckRollup | all(.conclusion == "SUCCESS")) | .number'); do
  # Check for do-not-merge label
  if ! gh pr view "$pr" --json labels --jq '.labels[].name' | grep -q "do-not-merge"; then
    echo "Merging PR #$pr (CI passed)"
    gh pr merge "$pr" --squash --delete-branch

    # Get linked issue and mark done
    ISSUE=$(gh pr view "$pr" --json body --jq '.body' | grep -oE 'Closes #[0-9]+' | grep -oE '[0-9]+' | head -1)
    if [ -n "$ISSUE" ]; then
      # Update project board status to Done
      mark_issue_done "$ISSUE"
    fi
  fi
done
```

**Do NOT:**
- Report "PRs are ready for merge" and stop
- Wait for user to request merge
- Summarize completed work and ask for next steps

**The loop continues until scope is complete. Green PR = immediate merge.**

## Scope Types

### Milestone

```bash
gh issue list --milestone "v1.0.0" --state open --json number --jq '.[].number'
```

### Epic

```bash
gh issue list --label "epic-dark-mode" --state open --json number --jq '.[].number'
```

### Unbounded (All Open Issues)

```bash
gh issue list --state open --json number --jq '.[].number'
```

**Do NOT ask for "UNBOUNDED" confirmation.** The user's request is their consent.

## Failure Handling

Workers that fail do NOT immediately become blocked:

```
Attempt 1 → Research → Attempt 2 → Research → Attempt 3 → Research → Attempt 4 → BLOCKED
```

Only after 3+ research cycles is an issue marked as blocked.

**See:** `reference/failure-recovery.md` for research cycle implementation.

### Blocked Determination

An issue is only marked blocked when:
- Multiple research cycles completed (3+)
- Research concludes "impossible without external input"
- Examples: missing credentials, requires human decision, external service down

## SLEEP/WAKE

### Entering SLEEP

Orchestration sleeps when:
- All issues are either blocked or in review
- No work can proceed without external event

State is posted to GitHub tracking issue (survives crashes).

### WAKE Mechanisms

- **SessionStart hook** - Checks CI status on new Claude session
- **Manual** - `claude --resume [SESSION_ID]`

## Checklist

Before starting orchestration:

- [ ] Scope identified (explicit or auto-detected)
- [ ] GitHub CLI authenticated (`gh auth status`)
- [ ] Tracking issue exists with `orchestration-tracking` label
- [ ] Project board configured with Status field
- [ ] Active marker written to MCP Memory

Bootstrap phase:

- [ ] Existing open PRs detected
- [ ] Release placeholders excluded (`release/*`, `release-placeholder`, `do-not-merge` labels)
- [ ] CI status checked for each PR
- [ ] Review artifacts verified before merge
- [ ] PRs merged or flagged for attention
- [ ] Bootstrap complete before spawning workers

During orchestration:

- [ ] Workers spawned using `Task(run_in_background: true)`
- [ ] Task IDs stored for each worker (issue# → task_id mapping)
- [ ] Worker status monitored with `TaskOutput(block: false)`
- [ ] Completed workers detected and handled
- [ ] Project board status updated (In Progress, In Review, Done)
- [ ] CI status monitored for open PRs
- [ ] Green PRs merged IMMEDIATELY
- [ ] Review artifacts verified before PR merge
- [ ] Failed workers trigger research cycles
- [ ] Handovers spawn replacement workers
- [ ] SLEEP entered when only waiting on CI
- [ ] Deviation resolution checked each loop
- [ ] Status posted to tracking issue

## Review Enforcement

**CRITICAL:** The orchestrator verifies review compliance:

1. **Before PR merge:**
   - Review artifact exists in issue comments
   - Review status is COMPLETE
   - Unaddressed findings = 0

2. **Child issues (from deferred findings):**
   - Follow full `issue-driven-development` process
   - Have their own code reviews
   - Track via `spawned-from:#N` label

3. **Deviation handling:**
   - Parent status set to Blocked on project board
   - Resumes only when all children closed

## Integration

This skill coordinates:

| Skill | Purpose |
|-------|---------|
| `worker-dispatch` | Spawning workers |
| `worker-protocol` | Worker behavior |
| `worker-handover` | Context passing |
| `ci-monitoring` | CI and WAKE handling |
| `research-after-failure` | Research cycles |
| `issue-driven-development` | Worker follows this |
| `comprehensive-review` | Workers must complete before PR |
| `project-board-enforcement` | ALL state queries and updates |

## Reference Files

- `reference/state-management.md` - State queries, updates, deviation handling
- `reference/loop-implementation.md` - Full loop code and helpers
- `reference/failure-recovery.md` - Research cycles, blocked handling, SLEEP/WAKE