home / skills / levnikolaevich / claude-code-skills / ln-1000-pipeline-orchestrator
ln-1000-pipeline-orchestrator skill

safe
This skill orchestrates parallel story processing across a kanban board, auto-manages stages, and ensures quality gates before merging.
npx playbooks add skill levnikolaevich/claude-code-skills --skill ln-1000-pipeline-orchestrator
Review the files below or copy the command above to add this skill to your agents.
Files (17)
SKILL.md
38.1 KB
---
name: ln-1000-pipeline-orchestrator
description: "Meta-orchestrator (L0): reads kanban board, drives Stories through pipeline 300->310->400->500 in parallel via TeamCreate. Max 3 concurrent Stories. Auto squash-merge to develop on quality gate PASS."
---

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

# Pipeline Orchestrator

Meta-orchestrator that reads the kanban board, builds a priority queue of Stories, and drives them through the full pipeline (task planning -> validation -> execution -> quality gate) using Claude Code Agent Teams for parallel Story processing.

## Purpose & Scope
- Parse kanban board and build Story priority queue
- Ask business questions in ONE batch before execution; make technical decisions autonomously
- Spawn per-story workers via TeamCreate (max 3 concurrent)
- Drive each Story through 4 stages: ln-300 -> ln-310 -> ln-400 -> ln-500
- Auto squash-merge to develop after quality gate PASS
- Handle failures, retries, and escalation to user

## Hierarchy

```
L0: ln-1000-pipeline-orchestrator (TeamCreate lead, delegate mode)
  +-- Story Workers (fresh per stage, shutdown after completion)
       |   All stages: Opus 4.6  |  Effort: Stage 0 = low | Stage 1,2 = medium | Stage 3 = medium
       +-- L1: ln-300 / ln-310 / ln-400 / ln-500 (invoked via Skill tool, as-is)
            +-- L2/L3: existing hierarchy unchanged
```

**Key principle:** ln-1000 does NOT modify existing skills. Workers invoke ln-300/ln-310/ln-400/ln-500 through Skill tool exactly as a human operator would.

## MCP Tool Preferences

When `mcp__hashline-edit__*` tools are available, workers MUST prefer them over standard file tools:

| Standard Tool | Hashline-Edit Replacement | Why |
|---------------|--------------------------|-----|
| `Read` | `mcp__hashline-edit__read_file` | Hash-prefixed lines enable precise edits |
| `Edit` | `mcp__hashline-edit__edit_file` | Atomic validation prevents corruption |
| `Write` | `mcp__hashline-edit__write_file` | Same behavior, consistent interface |
| `Grep` | `mcp__hashline-edit__grep` | Results include hashline refs for follow-up edits |

**Fallback:** If hashline-edit MCP unavailable (tools not in ToolSearch), use standard tools. No error.

## Task Storage Mode

**MANDATORY READ:** Load `shared/references/storage_mode_detection.md` for Linear vs File mode detection and operations.

## When to Use
- Multiple Stories ready for processing across kanban board statuses
- Need end-to-end automation: task planning -> validation -> execution -> quality gate -> merge
- Want parallel Story processing with minimal manual intervention

## Pipeline: 4-Stage State Machine

**MANDATORY READ:** Load `references/pipeline_states.md` for transition rules and guards.

```
Backlog       --> Stage 0 (ln-300) --> Backlog      --> Stage 1 (ln-310) --> Todo
(no tasks)        create tasks         (tasks exist)      validate            |
                                                          | NO-GO             |
                                                          v                   v
                                                       [retry/ask]    Stage 2 (ln-400)
                                                                             |
                                                                             v
                                                                      To Review
                                                                             |
                                                                             v
                                                                      Stage 3 (ln-500)
                                                                       |          |
                                                                      PASS       FAIL
                                                                       |          v
                                                                     Done    To Rework -> Stage 2
                                                                   (merged)    (max 2 cycles)
```

| Stage | Skill | Input Status | Output Status |
|-------|-------|-------------|--------------|
| 0 | ln-300-task-coordinator | Backlog (no tasks) | Backlog (tasks created) |
| 1 | ln-310-story-validator | Backlog (tasks exist) | Todo |
| 2 | ln-400-story-executor | Todo / To Rework | To Review |
| 3 | ln-500-story-quality-gate | To Review | Done / To Rework |

## Team Lead Responsibilities

This skill runs as a **team lead** in delegate mode. The agent executing ln-1000 MUST NOT write code or invoke skills directly.

| Responsibility | Description |
|---------------|-------------|
| **Coordinate** | Assign stages to workers, process completion reports, advance pipeline |
| **Verify board** | Re-read kanban/Linear after each stage. Workers update via skills; lead ASSERTs expected state transitions |
| **Escalate** | Route failures to user when retry limits exceeded |
| **Merge to develop** | Squash-merge to develop after quality gate PASS (lead-only action) |
| **Shutdown** | Graceful worker shutdown, team cleanup |

**NEVER do as lead:** Invoke ln-300/ln-310/ln-400/ln-500 directly. Edit source code. Skip quality gate. Force-kill workers.

## Workflow

### Phase 0: Recovery Check

```
IF .pipeline/state.json exists AND complete == false:
  # Previous run interrupted — resume from saved state
  1. Read .pipeline/state.json → restore: story_state, worker_map,
     quality_cycles, validation_retries, crash_count, priority_queue_ids,
     story_results, infra_issues, worktree_map, depends_on,
     stage_timestamps, git_stats, pipeline_start_time, readiness_scores
  2. Read .pipeline/checkpoint-*.json → validate story_state consistency
     (checkpoint.stage should match story_state[id])
  3. Re-read kanban board → rebuild priority_queue from priority_queue_ids
     (skip stories already DONE/PAUSED)
  4. Re-parse Story dependencies → rebuild depends_on (defense in depth)
  5. Read team config → verify worker_map members still exist
  6. Set suspicious_idle[*] = false (ephemeral, reset on recovery)
  7. For each story with story_state IN ("STAGE_0".."STAGE_3"):
     IF checkpoint.agentId exists → Task(resume: checkpoint.agentId)
     ELSE → respawn worker with checkpoint context (see checkpoint_format.md)
  8. Jump to Phase 4 event loop

IF .pipeline/state.json NOT exists OR complete == true:
  # Fresh start — proceed to Phase 1
```

### Phase 1: Discovery & Kanban Parsing

**MANDATORY READ:** Load `references/kanban_parser.md` for parsing patterns.

1. Auto-discover `docs/tasks/kanban_board.md` (or Linear API via storage mode detection)
2. Extract project brief from target project's CLAUDE.md (NOT skills repo):
   ```
   project_brief = {
     name: <from H1 or first line>,
     tech: <from Development Commands / tech references>,
     type: <inferred: "CLI", "API", "web app", "library">,
     key_rules: <2-3 critical rules>
   }
   IF not found: project_brief = { name: basename(project_root), tech: "unknown" }
   ```
3. Parse all status sections: Backlog, Todo, In Progress, To Review, To Rework
4. Extract Story list with: ID, title, status, Epic name, task presence
5. Build priority queue:
   ```
   Priority: To Review > To Rework > In Progress > Todo > Backlog
   ```
6. Filter: skip Stories in Done, Postponed, Canceled
7. Detect task presence per Story:
   - Has `_(tasks not created yet)_` → **no tasks** → Stage 0
   - Has task lines (4-space indent) → **tasks exist** → Stage 1+
8. Extract dependencies per Story (see `references/kanban_parser.md` Dependency Extraction):
   - Read each Story file → parse `## Dependencies / ### Depends On` section
   - Build `depends_on[storyId] = [prerequisite IDs]`
   - Prerequisites already Done → satisfied, ignore. Not found → WARN, treat as none
   - Circular dependencies → ESCALATE to user
9. Extract story briefs from Linear (for lead awareness):
   ```
   FOR EACH story in priority_queue:
     description = get_issue(story.id).description
     story_briefs[id] = parse <!-- ORCHESTRATOR_BRIEF_START/END --> markers
     IF no markers: story_briefs[id] = { tech: project_brief.tech, keyFiles: "unknown" }
   ```
10. Show pipeline plan to user:
   ```
   Project: {project_brief.name} ({project_brief.tech})

   Pipeline Plan:
   | # | Story | Tech | Approach | Stage | Deps | Action |
   |---|-------|------|----------|-------|------|--------|
   | 1 | PROJ-42 | Python, FastAPI | Token middleware | 3 | — | Quality gate |
   | 2 | PROJ-55 | Python, FastAPI | CRUD + Alembic | 0 | PROJ-42 | Create tasks |
   ```

### Phase 2: Pre-flight Questions (ONE batch)

1. Load Story descriptions (metadata only) for top stories in pipeline scope
2. Scan for business ambiguities — questions where:
   - Answer cannot be found in codebase, docs, or standards
   - Answer requires business/product decision (payment provider, auth flow, UI preference)
3. Collect ALL business questions into single AskUserQuestion:
   ```
   "Before starting pipeline:
    Story PROJ-42: Which payment provider? (Stripe/PayPal/both)
    Story PROJ-45: Auth flow — JWT or session-based?"
   ```
4. Technical questions — resolve using project_brief:
   - Library versions: MCP Ref / Context7 (for `project_brief.tech` ecosystem)
   - Architecture patterns: `project_brief.key_rules`
   - Standards compliance: ln-310 Phase 2 handles this
5. Store answers in shared context (pass to workers via spawn prompt)

**Skip Phase 2** if no business questions found. Proceed directly to Phase 3.

### Phase 3: Team Setup

**MANDATORY READ:** Load `references/settings_template.json` for required permissions and hooks.

#### 3.0 Linear Status Cache (Linear mode only)

```
IF storage_mode == "linear":
  statuses = list_issue_statuses(teamId=team_id)
  status_cache = {status.name: status.id FOR status IN statuses}

  REQUIRED = ["Backlog", "Todo", "In Progress", "To Review", "To Rework", "Done"]
  missing = [s for s in REQUIRED if s not in status_cache]
  IF missing: ABORT "Missing Linear statuses: {missing}. Configure workflow."

  # Persist in state.json (added in 3.2) and pass to workers via prompt CONTEXT
```

#### 3.1 Pre-flight: Settings Verification

Verify `.claude/settings.local.json` in target project:
- `defaultMode` = `"bypassPermissions"` (required for workers)
- `hooks.Stop` registered → `pipeline-keepalive.sh`
- `hooks.TeammateIdle` registered → `worker-keepalive.sh`

If missing or incomplete → copy from `references/settings_template.json` and install hook scripts via Bash `cp` (NOT Write tool — Write produces CRLF on Windows, breaking `#!/bin/bash` shebang):
```
mkdir -p .claude/hooks
Bash: cp {skill_repo}/ln-1000-pipeline-orchestrator/references/hooks/pipeline-keepalive.sh .claude/hooks/pipeline-keepalive.sh
Bash: cp {skill_repo}/ln-1000-pipeline-orchestrator/references/hooks/worker-keepalive.sh  .claude/hooks/worker-keepalive.sh
```

**Hook troubleshooting:** If hooks fail with "No such file or directory":
1. Verify hook commands use `bash .claude/hooks/script.sh` (relative path, no env vars — `$CLAUDE_PROJECT_DIR` is NOT available in hook shell context)
2. Verify `.claude/hooks/*.sh` files exist and have `#!/bin/bash` shebang
3. On Windows: ensure LF line endings in .sh files (see hook installation above — use Bash `cp`, not Write tool)

#### 3.2 Initialize Pipeline State

```
Write .pipeline/state.json (full schema — see checkpoint_format.md):
  { "complete": false, "active_workers": 0, "stories_remaining": N, "last_check": <now>,
    "story_state": {}, "worker_map": {}, "quality_cycles": {}, "validation_retries": {},
    "crash_count": {}, "priority_queue_ids": [<all story IDs>],
    "worktree_map": {}, "depends_on": {}, "story_results": {}, "infra_issues": [],
    "status_cache": {<status_name: status_uuid>},    # Empty object if file mode
    "stage_timestamps": {}, "git_stats": {}, "pipeline_start_time": <now>, "readiness_scores": {},
    "skill_repo_path": <absolute path to skills repository root>,
    "team_name": "pipeline-{YYYY-MM-DD}",
    "business_answers": {<question: answer pairs from Phase 2, or {} if skipped>},
    "total_merged_stories": 0,
    "storage_mode": "file"|"linear",
    "project_brief": {<name, tech, type, key_rules from Phase 1 step 2>},
    "story_briefs": {<storyId: {tech, keyFiles, approach, complexity} from Phase 1 step 9>} }   # Recovery-critical
Write .pipeline/lead-session.id with current session_id   # Stop hook uses this to only keep lead alive
```

#### 3.2a Sleep Prevention (Windows only)

```
IF platform == "win32":
  Bash: cp {skill_repo}/ln-1000-pipeline-orchestrator/references/hooks/prevent-sleep.ps1 .claude/hooks/prevent-sleep.ps1
  Bash: powershell -ExecutionPolicy Bypass -WindowStyle Hidden -File .claude/hooks/prevent-sleep.ps1 &
  sleep_prevention_pid = $!
  # Script polls .pipeline/state.json — self-terminates when complete=true
  # Fallback: Windows auto-releases execution state on process exit
```

#### 3.3 Create Team & Spawn Workers

**Worktrees:** Every worker gets its own worktree with a named feature branch (`feature/{id}-{slug}`). Created in Phase 4 spawn loop.

**Model routing:** All stages use `model: "opus"`. Effort routing via prompt: `effort_for_stage(0) = "low"`, `effort_for_stage(1) = "medium"`, `effort_for_stage(2) = "medium"`, `effort_for_stage(3) = "medium"`. Crash recovery = same as target stage. Thinking mode: always enabled (adaptive).

1. Ensure `develop` branch exists:
   ```
   IF `develop` branch not found locally or on origin:
     git branch develop master        # Create from master
     git push -u origin develop
   git checkout develop               # Start pipeline from develop
   ```

2. Create team:
   ```
   TeamCreate(team_name: "pipeline-{YYYY-MM-DD}")
   ```

Workers are spawned by Phase 4 spawn loop on first heartbeat — NOT here. This avoids duplicate spawn logic.

### Phase 4: Execution Loop

**MANDATORY READ:** Load `references/message_protocol.md` for exact message formats and parsing regex.
**MANDATORY READ:** Load `references/worker_health_contract.md` for crash detection and respawn rules.

**Lead operates in delegate mode — coordination only, no code writing.**

**MANDATORY READ:** Load `references/checkpoint_format.md` for checkpoint schema and resume protocol.

```
# --- INITIALIZATION ---
active_workers = 0                    # Current worker count (invariant: <= 3)
quality_cycles = {}                   # {storyId: count} — FAIL→retry counter, limit 2
validation_retries = {}               # {storyId: count} — NO-GO retry counter, limit 1
crash_count = {}                      # {storyId: count} — crash respawn counter, limit 1
suspicious_idle = {}                  # {storyId: bool} — crash detection flag
story_state = {}                      # {storyId: "STAGE_0"|"STAGE_1"|"STAGE_2"|"STAGE_3"|"DONE"|"PAUSED"}
worker_map = {}                       # {storyId: worker_name}
depends_on = {}                       # {storyId: [prerequisite IDs]} — from Phase 1 step 7
worktree_map = {}                     # {storyId: worktree_dir | null} — tracks which stories use worktrees
story_results = {}                    # {storyId: {stage0: "...", stage1: "...", ...}} — for pipeline report
infra_issues = []                     # [{phase, type, message}] — infrastructure problems for report
heartbeat_count = 0                   # Heartbeat cycle counter (ephemeral, resets on recovery)
stage_timestamps = {}                # {storyId: {stage_N_start: ISO, stage_N_end: ISO}}
git_stats = {}                       # {storyId: {lines_added, lines_deleted, files_changed}}
pipeline_start_time = now()          # ISO 8601 — wall-clock start for duration metrics
readiness_scores = {}                # {storyId: readiness_score} — from Stage 1 GO, used for Stage 3 fast-track

# Helper functions for heartbeat formatting
skill_name_from_stage(stage):
  """Returns skill name for stage number."""
  RETURN {0: "ln-300-task-coordinator", 1: "ln-310-story-validator",
          2: "ln-400-story-executor", 3: "ln-500-story-quality-gate"}[stage]

predict_next_step(current_stage):
  """Predicts next pipeline step for story."""
  IF current_stage == 0: RETURN "Validation (ln-310) → Todo"
  IF current_stage == 1: RETURN "Execution (ln-400) → To Review"
  IF current_stage == 2: RETURN "Quality gate (ln-500) → PASS/FAIL"
  IF current_stage == 3: RETURN "Squash merge to develop → Done"

stage_duration(story_id, stage_num):
  """Returns formatted duration (Xm Ys) for a stage, or None if timestamps missing."""
  start = stage_timestamps[story_id].get("stage_{stage_num}_start")
  end = stage_timestamps[story_id].get("stage_{stage_num}_end")
  IF start AND end: RETURN format_duration(end - start)
  RETURN None

# Initialize counters for all queued stories
FOR EACH story IN priority_queue:
  quality_cycles[story.id] = 0
  validation_retries[story.id] = 0
  crash_count[story.id] = 0
  suspicious_idle[story.id] = false
  story_state[story.id] = "QUEUED"

# --- EVENT LOOP (driven by Stop hook heartbeat) ---
# HOW THIS WORKS:
# 1. Lead's turn ends → Stop event fires
# 2. pipeline-keepalive.sh reads .pipeline/state.json → complete=false → exit 2
# 3. stderr "HEARTBEAT: N workers, M stories..." → new agentic loop iteration
# 4. Any queued worker messages (SendMessage) delivered in this cycle
# 5. Lead processes messages via ON handlers (reactive) + verifies done-flags (proactive)
# 6. Lead's turn ends → Go to step 1
#
# The Stop hook IS the event loop driver. Each heartbeat = one iteration.
# Lead MUST NOT say "waiting for messages" and stop — the heartbeat keeps it alive.
# If no worker messages arrived: output brief status, let turn end → next heartbeat.
#
# --- CONTEXT RECOVERY PROTOCOL ---
# Claude Code may compress conversation history during long pipelines.
# When this happens, you lose SKILL.md instructions and state variables.
# The Stop hook includes "---PIPELINE RECOVERY CONTEXT---" in EVERY heartbeat stderr.
#
# IF you see this block and don't recall the pipeline protocol:
#   Follow CONTEXT RECOVERY PROTOCOL in references/phases/phase4_heartbeat.md (7 steps).
#   Quick summary: state.json → SKILL.md(FULL) → handlers → heartbeat → known_issues → ToolSearch → resume
#
# Cost: ~5 file reads (~1300 lines, ~2500 tokens), one-time per compression event.
# Normal operation: 0 extra reads. Recovery block in stderr is passive anchor.
#
# FRESH WORKER PER STAGE: Each stage transition = shutdown old worker + spawn new one.
# active_workers stays same (net-zero). Only DONE/PAUSED/ERROR decrement active_workers.
#
# BIDIRECTIONAL HEALTH MONITORING (Phase 4, Step 3):
# - Reactive: ON handlers process worker completion messages
# - Proactive: Verify done-flags without messages (lost message recovery)
# - Defense-in-depth: Handles network issues, context overflow, worker crashes

WHILE ANY story_state[id] NOT IN ("DONE", "PAUSED"):

  # 1. Spawn workers for queued stories (respecting concurrency + dependency limits)
  WHILE active_workers < 3 AND priority_queue NOT EMPTY:
    story = priority_queue.peek()            # Don't pop yet — may be blocked

    # Dependency guard: all prerequisites must be DONE
    blocked_deps = [d for d in depends_on[story.id] if story_state[d] != "DONE"]
    IF blocked_deps NOT EMPTY:
      priority_queue.skip(story.id)          # Move to next candidate
      CONTINUE                               # Try next story in queue
    priority_queue.pop()                     # Safe to start

    target_stage = determine_stage(story)    # See pipeline_states.md guards
    worker_name = "story-{story.id}-s{target_stage}"

    worktree_dir = ".worktrees/story-{story.id}"
    git worktree add -b feature/{story.id}-{slug} {worktree_dir} develop

    worktree_map[story.id] = worktree_dir
    project_root = Bash("pwd")           # Absolute path for PIPELINE_DIR in worktree mode
    Task(name: worker_name, team_name: "pipeline-{date}",
         model: "opus", mode: "bypassPermissions",
         subagent_type: "general-purpose",
         prompt: worker_prompt(story, target_stage, business_answers, worktree_dir, project_root))
    worker_map[story.id] = worker_name
    story_state[story.id] = "STAGE_{target_stage}"
    stage_timestamps[story.id] = stage_timestamps.get(story.id, {})
    stage_timestamps[story.id]["stage_{target_stage}_start"] = now()
    active_workers++
    Write .pipeline/worker-{worker_name}-active.flag     # For TeammateIdle hook
    Update .pipeline/state.json: active_workers, last_check
    SendMessage(recipient: worker_name,
                content: "Execute Stage {target_stage} for {story.id}",
                summary: "Stage {target_stage} assignment")

  # 1b. Deadlock detection: all remaining stories blocked on non-DONE dependencies
  IF active_workers == 0 AND priority_queue NOT EMPTY:
    unblockable = [s for s in priority_queue if ALL d in depends_on[s.id]: story_state[d] == "DONE"]
    IF unblockable EMPTY:
      FOR EACH s IN priority_queue: story_state[s.id] = "PAUSED"
      ESCALATE: "Deadlocked: remaining stories depend on PAUSED/incomplete stories: {ids}"

  # 2. Process worker messages (reactive message handling)
  #
  **MANDATORY READ:** Load `references/phases/phase4_handlers.md` for all ON message handlers:
  - Stage 0 COMPLETE / ERROR (task planning outcomes)
  - Stage 1 COMPLETE (GO / NO-GO validation outcomes with retry logic)
  - Stage 2 COMPLETE / ERROR (execution outcomes)
  - Stage 3 COMPLETE (PASS/CONCERNS/WAIVED/FAIL quality gate outcomes with rework cycles)
  - Worker crash detection (3-step protocol: flag → probe → respawn)

  Handlers include sender validation and state guards to prevent duplicate processing.

  # 2.5. Active done-flag verification (proactive health monitoring)
  #
  **MANDATORY READ:** Load `references/phases/phase4_heartbeat.md` for bidirectional health monitoring:
  - Lost message detection (done-flag exists but state not advanced)
  - Synthetic recovery from checkpoint + kanban verification (all 4 stages)
  - Fallback to probe protocol when checkpoint missing
  - Structured heartbeat output (table format with worker status)
  - Helper functions (skill_name_from_stage, predict_next_step)

  This complements reactive crash detection (ON TeammateIdle) with proactive polling every ~60s.

  # 3. Heartbeat state persistence
  #
  ON HEARTBEAT (Stop hook stderr: "HEARTBEAT: N workers, M stories..."):
    Write .pipeline/state.json with ALL state variables.
    # See phase4_heartbeat.md for persistence details
```

**`determine_stage(story)` routing:** See `references/pipeline_states.md` Stage-to-Status Mapping table.

#### Phase 4a: Git Flow & Squash Merge

**MANDATORY READ:** Load `references/phases/phase4a_git_merge.md` for squash-merge procedure:
- Sync with develop (rebase → fallback to merge on conflict)
- Squash merge into develop (single commit per Story)
- Worktree cleanup
- Context refresh (reload SKILL.md after large merges)
- Story report appending (stage results + counters + problems)
- Kanban + Linear verification (sync check)

Executed after Stage 3 PASS verdict from ln-500-story-quality-gate.

#### Phase 4b: Cross-Story Health Check (every 5th merged Story)

After squash-merge, increment `total_merged_stories` counter. When `total_merged_stories % 5 == 0`:

1. On develop branch, Grep for top-5 hotspot patterns across `src/` (count mode):
   - Error handlers: `catch.*Error|handleError|handleCommandError`
   - Validators: `validate|isValid|checkInput`
   - Config access: `getSettings|getConfig|loadConfig`
   - HTTP wrappers: `httpClient|apiClient|fetchWrapper`
   - Parsers: `parseResponse|parseError|parseApi`
2. If ANY pattern appears in **5+ files** → WARN user:
   ```
   Cross-Story Health Check (after {N} Stories):
   WARNING: {pattern} duplicated in {count} files.
   Recommend: create refactoring Epic or run ln-620-codebase-auditor.
   ```
3. Log result in pipeline report (`story_results` → health_check entry)

**NOTE:** Warning-only — does NOT block pipeline. User decides whether to act.

### Phase 5: Cleanup & Self-Verification

```
# 0. Signal pipeline complete (allows Stop hook to pass)
Write .pipeline/state.json: { "complete": true, ... }

# 1. Wait for all active workers to complete
ASSERT active_workers == 0

# 2. Self-verify against Definition of Done
verification = {
  kanban_parsed:    priority_queue was built          # Phase 1 ✓
  questions_asked:  business_answers stored OR none   # Phase 2 ✓
  team_created:     team exists                       # Phase 3 ✓
  all_processed:    ALL story_state[id] IN ("DONE", "PAUSED")  # Phase 4 ✓
  merged_develop:   EVERY "DONE" story squash-merged to develop  # Phase 4a ✓
  linear_synced:    IF storage_mode == "linear": ALL "DONE" stories match Linear status  # Phase 4a.6 ✓
  on_develop:       Current branch is develop              # Phase 5 ✓
}
IF ANY verification == false: WARN user with details

# 3. Finalize pipeline report
Prepend summary header to docs/tasks/reports/pipeline-{date}.md:
  # Pipeline Report — {date}
  | Metric | Value |
  |--------|-------|
  | Stories processed | {total} |
  | Completed (DONE) | {count where story_state == "DONE"} |
  | Paused (needs intervention) | {count where story_state == "PAUSED"} |
  | Total quality rework cycles | {sum of quality_cycles} |
  | Total validation retries | {sum of validation_retries} |
  | Total crash recoveries | {sum of crash_count} |
  | Infrastructure issues | {len(infra_issues)} |
  | **Models used** | **{list of unique models from session}** |
  | **Total tokens consumed** | **{sum of tokens from all tool uses}** |

  **Note:** Model and token data collected from session analysis (all Task spawns + tool uses).
  Breakdown: Lead (Opus 4.6) + Workers (per-stage model allocation).

# 3b. Stage Duration Breakdown
Append Stage Duration section:
  ## Stage Duration Breakdown
  | Story | Stage 0 | Stage 1 | Stage 2 | Stage 3 | Total | Bottleneck |
  |-------|---------|---------|---------|---------|-------|------------|
  FOR EACH story WHERE story_state[id] IN ("DONE", "PAUSED"):
    durations = {N: stage_timestamps[id]["stage_{N}_end"] - stage_timestamps[id]["stage_{N}_start"]
                 FOR N IN 0..3 IF both timestamps exist}
    total = sum(durations.values())
    bottleneck = key with max(durations)
    | {id} | {durations[0] or "—"} | {durations[1] or "—"} | {durations[2] or "—"} | {durations[3] or "—"} | {total} | Stage {bottleneck} |

# 3c. Code Output Metrics
Append Code Output section:
  ## Code Output Metrics
  | Story | Files Changed | Lines Added | Lines Deleted | Net Lines |
  |-------|--------------|-------------|---------------|-----------|
  FOR EACH story WHERE git_stats[id] exists:
    | {id} | {git_stats[id].files_changed} | +{git_stats[id].lines_added} | -{git_stats[id].lines_deleted} | {net} |
  **Total:** {sum files_changed} files, +{sum lines_added} / -{sum lines_deleted} lines

# 3d. Cost Estimate
Append Cost Estimate section:
  ## Cost Estimate
  | Metric | Value |
  |--------|-------|
  | Wall-clock time | {now() - pipeline_start_time} |
  | Total worker spawns | {count of Task() calls in session} |
  | Hashline-edit usage | {count mcp__hashline-edit__* calls in Stage 2 workers} / {total file edits} |

# 3a. Collect infrastructure issues
# Analyze entire pipeline session for non-fatal problems:
# hook/settings failures, git conflicts, worktree errors, merge issues,
# Linear sync mismatches, worker crashes, permission errors, any unexpected fallbacks.
# Populate infra_issues = [{phase, type, message}] from session context.

Append Infrastructure Issues section:
  ## Infrastructure Issues
  IF infra_issues NOT EMPTY:
    | # | Phase | Type | Details |
    |---|-------|------|---------|
    FOR EACH issue IN infra_issues:
      | {N} | {issue.phase} | {issue.type} | {issue.message} |
  ELSE:
    _No infrastructure issues._

Append Operational Recommendations section (auto-generated from counters):
  ## Operational Recommendations
  - IF any quality_cycles > 0: "Story {id} needed {N} quality cycles. Improve task specs or acceptance criteria."
  - IF any validation_retries > 0: "Story {id} failed validation. Review Story/Task structure."
  - IF any crash_count > 0: "Worker crashed {N} times for {id}. Check for context-heavy operations."
  - IF any PAUSED: "Stories {ids} require manual intervention."
  - IF any Linear sync mismatches: "Linear/kanban sync issues detected for {ids}. Verify statuses manually."
  - IF any infra_issues with type "hook": "Hook configuration errors. Verify settings.local.json and .claude/hooks/."
  - IF any infra_issues with type "git": "Git conflicts encountered. Rebase feature branches more frequently."
  - IF any infra_issues with type "worktree": "Worktree failures. Check disk space and existing worktree state."
  - IF all DONE with 0 retries AND no infra_issues: "Clean run — no issues detected."

Append Process Improvement section (auto-generated from pipeline analysis):
  ## Process Improvement Suggestions
  Analyze pipeline session and generate suggestions in 4 categories:

  ### Efficiency (reduce time/steps)
  - IF any story went through all 4 stages (0→1→2→3): "Consider skipping Stage 0/1 for stories with pre-validated tasks (resume from Stage 2)."
  - IF multiple stories produced similar Stage 0 output: "Stories {ids} had similar task plans. Consider task templates to skip planning."
  - IF Stage 2 was bottleneck (longest stage across stories): "Execution dominated pipeline time. Split large stories for better parallelism."

  ### Cost (reduce token usage)
  - IF any crash_count > 0: "Crashes waste full stage token budget. Reduce context-heavy operations or add intermediate checkpoints."
  - IF quality_cycles > 0: "Rework cycles multiply cost — Stage 2+3 repeated {N} times. Invest in better task specs upfront (ln-300)."
  - IF validation_retries > 0: "Validation retry = wasted Stage 0+1. Improve story templates or run ln-310 earlier."
  - General: "Review worker prompt sizes. Shorter focused prompts reduce per-spawn token cost."

  ### Quality (improve output)
  - IF any Stage 3 verdict was CONCERNS: "Story {id} passed with concerns. Tighter AC or stricter test coverage may prevent debt."
  - IF any Stage 3 score < 80: "Low quality ({score}/100) for {id}. Consider: more specific AC, ln-002 research before coding, stricter ln-402 review."
  - IF agent reviews (ln-514) found issues not caught by ln-402: "External agents caught missed issues. Consider running agent review earlier."
  - IF all scores > 90: "High quality scores. Current process works well — maintain."

  ### Process Architecture (structural improvements)
  - IF pipeline ran > 5 stories: "Large batch. Consider increasing max_workers or grouping into sub-batches."
  - IF any PAUSED: "PAUSED stories indicate systematic issues. Analyze: task spec quality? Missing context? Unclear AC?"
  - IF depends_on blocked stories for extended periods: "Dependency chains caused idle workers. Reorder stories to minimize blocking."
  - General: "Compare metrics across runs to track trends: quality_score, avg cycles per story, crash rate."

# 4. Show pipeline summary to user
```
Pipeline Complete:
| Story | Stage 0 | Stage 1 | Stage 2 | Stage 3 | Merged | Final State |
|-------|---------|---------|---------|---------|--------|------------|
| PROJ-42 | skip | skip | skip | PASS 92 | yes | DONE |
| PROJ-55 | 5 tasks | GO | Done | PASS 85 | yes | DONE |
| PROJ-60 | skip | NO-GO | — | — | — | PAUSED |

Report saved: docs/tasks/reports/pipeline-{date}.md
```
# 5. Shutdown remaining workers (if any still active)
FOR EACH worker_name IN worker_map.values():
  SendMessage(type: "shutdown_request", recipient: worker_name)

# 6. Cleanup team
TeamDelete

# 7. Remove remaining worktrees (PAUSED stories not cleaned by Phase 4a)
IF .worktrees/ directory exists:
  FOR EACH story in worktree_map WHERE worktree_dir != null:
    git worktree remove {worktree_dir} --force
  rm -rf .worktrees/

# 8. Ensure on develop branch
git checkout develop

# 9. Remove pipeline state files

# 9a. Stop sleep prevention (Windows safety net — script should have self-terminated)
IF sleep_prevention_pid:
  kill $sleep_prevention_pid 2>/dev/null || true
Delete .pipeline/ directory

# 10. Report results and report location to user
```

## Kanban as Single Source of Truth

- **Lead = single writer** to kanban_board.md. Workers report results via SendMessage; lead updates the board
- **Re-read board** after each stage completion for fresh state
- **Update algorithm:** Follow `shared/references/kanban_update_algorithm.md` for Epic grouping and indentation

## Error Handling

| Situation | Detection | Action |
|-----------|----------|--------|
| ln-300 task creation fails | Worker reports error | Escalate to user: "Cannot create tasks for Story {id}" |
| ln-310 NO-GO (Score <5) | Worker reports NO-GO | Retry once (ln-310 auto-fixes). If still NO-GO -> ask user |
| Task in To Rework 3+ times | Worker reports rework loop | Escalate: "Task X reworked 3 times, need input" |
| ln-500 FAIL | Worker reports FAIL verdict | Fix tasks auto-created by ln-500. Stage 2 re-entry. Max 2 quality cycles |
| Worker crash | TeammateIdle without completion msg | Re-spawn worker, resume from last stage |
| All Stories blocked | Empty actionable queue | Report to user, cleanup team |
| Business question mid-execution | Worker encounters ambiguity | Worker -> lead -> user -> lead -> worker (message chain) |
| Merge conflict | git merge --squash fails | Escalate to user, Story PAUSED, manual resolution required |

## Critical Rules

1. **Max 3 concurrent Stories.** Never spawn more than 3 story-workers simultaneously
2. **Delegate mode.** Lead coordinates only — never invoke ln-300/ln-310/ln-400/ln-500 directly. Workers do all execution
3. **Skills as-is.** Never modify or bypass existing skill logic. Workers call `Skill("ln-310-story-validator", args)` exactly as documented
4. **Kanban verification.** Workers update Linear/kanban via skills. Lead re-reads and ASSERTs expected state after each stage. In file mode, lead resolves merge conflicts
5. **Quality cycle limit.** Max 2 quality FAILs per Story (original + 1 rework cycle). After 2nd FAIL, escalate to user
6. **Squash per Story.** Each Story that passes quality gate gets squash-merged to develop separately. No batch merges
7. **Re-read kanban.** After every stage completion, re-read board for fresh state. Never cache
8. **Graceful shutdown.** Always shutdown workers via shutdown_request. Never force-kill

## Known Issues

**MANDATORY READ:** Load `references/known_issues.md` for production-discovered problems and self-recovery patterns.

## Anti-Patterns
- Running ln-300/ln-310/ln-400/ln-500 directly from lead instead of delegating to workers
- Spawning >3 workers simultaneously
- Lead skipping kanban verification after worker updates (workers write via skills, lead MUST re-read + ASSERT)
- Skipping quality gate after execution
- Merging to develop before quality gate PASS
- Caching kanban state instead of re-reading
- Reading `~/.claude/teams/*/inboxes/*.json` directly (messages arrive automatically)
- Using `sleep` + filesystem polling for message checking
- Parsing internal Claude Code JSON formats (permission_request, idle_notification)
- Reusing same worker across stages (context exhaustion — spawn fresh worker per stage)
- Processing messages without verifying sender matches worker_map (stale message confusion from old/dead workers)

## Plan Mode Support

When invoked in Plan Mode, generate execution plan without creating team:

1. Parse kanban board (Phase 1)
2. Build priority queue
3. Show pipeline plan table (which Stories, which stages)
4. Write plan to plan file, call ExitPlanMode

**Plan Output Format:**
```
## Pipeline Plan for {date}

| # | Story | Status | Stage | Skill | Expected Outcome |
|---|-------|--------|-------|-------|-----------------|
| 1 | {ID}: {Title} | To Review | 3 | ln-500 | Done + PR |
| 2 | {ID}: {Title} | Todo | 2 | ln-400 | To Review |

### Execution Sequence
1. TeamCreate("pipeline-{date}")
2. Spawn story-worker for {Story-1} -> Stage 3 (ln-500)
3. Spawn story-worker for {Story-2} -> Stage 2 (ln-400)
4. Wait for completions, advance stages, squash-merge to develop
5. Cleanup
```

## Definition of Done (self-verified in Phase 5)

| # | Criterion | Verified By |
|---|-----------|-------------|
| 1 | Kanban board parsed, priority queue built | `priority_queue` was populated |
| 2 | Business questions asked in single batch (or none found) | `business_answers` stored OR skip |
| 3 | Team created, workers spawned (max 3 concurrent) | `active_workers` never exceeded 3 |
| 4 | ALL Stories processed: state = DONE or PAUSED | `ALL story_state[id] IN ("DONE", "PAUSED")` |
| 4b | Cross-story health checked (if threshold met) | Warning logged or N/A |
| 5 | Every DONE Story squash-merged into develop | Feature branches merged, on develop branch |
| 6 | Pipeline summary shown to user | Phase 5 table output |
| 7 | Team cleaned up (workers shutdown, TeamDelete) | `active_workers == 0`, TeamDelete called |

## Reference Files

### Phase 4 Procedures (Progressive Disclosure)
- **Message handlers:** `references/phases/phase4_handlers.md` (Stage 0-3 ON handlers, crash detection)
- **Heartbeat & verification:** `references/phases/phase4_heartbeat.md` (Active done-flag checking, structured heartbeat output)
- **Git flow:** `references/phases/phase4a_git_merge.md` (Squash merge, worktree cleanup, sync verification)

### Core Infrastructure
- **Known issues:** `references/known_issues.md` (production-discovered problems and self-recovery)
- **Message protocol:** `references/message_protocol.md`
- **Worker health:** `references/worker_health_contract.md`
- **Checkpoint format:** `references/checkpoint_format.md`
- **Settings template:** `references/settings_template.json`
- **Hooks:** `references/hooks/pipeline-keepalive.sh`, `references/hooks/worker-keepalive.sh`
- **Kanban parsing:** `references/kanban_parser.md`
- **Pipeline states:** `references/pipeline_states.md`
- **Worker prompts:** `references/worker_prompts.md`
- **Kanban update algorithm:** `shared/references/kanban_update_algorithm.md`
- **Storage mode detection:** `shared/references/storage_mode_detection.md`
- **Auto-discovery patterns:** `shared/references/auto_discovery_pattern.md`

### Delegated Skills
- `../ln-300-task-coordinator/SKILL.md`
- `../ln-310-story-validator/SKILL.md`
- `../ln-400-story-executor/SKILL.md`
- `../ln-500-story-quality-gate/SKILL.md`

---
**Version:** 1.0.0
**Last Updated:** 2026-02-13
Overview

This skill is a meta-orchestrator that reads a kanban board, builds a prioritized Story queue, and drives Stories through a four-stage pipeline (ln-300 → ln-310 → ln-400 → ln-500) using Claude Code Agent Teams. It spawns per-Story workers (max 3 concurrent), manages retries and failure escalation, and performs an auto squash-merge to develop when the quality gate passes. The lead runs in delegate mode and coordinates workers without modifying existing stage skills.
How this skill works

The orchestrator parses the kanban (or Linear) board, extracts story metadata and dependencies, then runs a pre-flight question batch for business decisions. It creates a team and spawns ephemeral Story workers that invoke ln-300/ln-310/ln-400/ln-500 via the Skill tool to advance each Story through stages. State is persisted in .pipeline/state.json for recovery; the lead verifies state transitions, re-reads the board after each stage, and performs the final squash-merge on PASS.
When to use it

Multiple Stories across board statuses need end-to-end automation from planning to merge
You want parallel Story processing while respecting a 3-worker concurrency limit
You need strict recovery and checkpointing for interrupted runs
You must enforce a quality gate before merging to develop
You need an orchestrator that preserves existing stage skills and invokes them as a human operator would
Best practices

Run storage_mode_detection.md to decide Linear vs File flows and obey required reads (pipeline_states.md, kanban_parser.md)
Collect ALL business questions in a single pre-flight ask to minimize context switches
Prefer mcp__hashline-edit__* tools when available; fall back to standard file tools if not
Keep develop branch present and up-to-date; lead must perform squash-merge only after quality PASS
Respect worker lifecycle: spawn per-Story workers, shutdown after completion, and never edit stage skill code as lead
Example use cases

Batch process backlog Stories to create tasks and move them into Todo automatically
Validate and execute multiple feature Stories in parallel while enforcing dependency order
Run a recovery after an interrupted pipeline using .pipeline/state.json and checkpoints
Automate quality gate checks and auto-merge passing Stories into develop
Escalate unresolved validation or circular dependency issues to a human reviewer
FAQ

How many Stories run concurrently?
The orchestrator limits concurrent Story workers to three. The spawn loop enforces this cap.
What happens on interruption or crash?
On restart the lead loads .pipeline/state.json and checkpoint files, rebuilds the priority queue, respawns workers for in-progress Stories, and resumes the execution loop.