home / skills / velcrafting / codex-skills / job-worker-orchestration

job-worker-orchestration skill

safe

/skills/backend/job-worker-orchestration

This skill implements idempotent, observable background jobs using the repo's queue system with bounded retries and durable state.

npx playbooks add skill velcrafting/codex-skills --skill job-worker-orchestration

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.7 KB

---
name: job-worker-orchestration
description: Add background jobs and orchestration with idempotency, retries, and observability aligned to repo conventions.
metadata:
  short-description: Jobs + orchestration
  layer: backend
  mode: write
  idempotent: false
---

# Skill: backend/job-worker-orchestration

## Purpose
Implement background work safely using the repo’s job system (queue/cron/worker) with:
- explicit idempotency guarantees
- bounded retries and backoff
- durable state and re-entrancy safety
- observability (logs/metrics/traces) sufficient to operate it

Use when work is async, scheduled, long-running, or must survive restarts.

---

## Inputs
- Job intent (what it does, why it’s async)
- Trigger type:
  - event-driven (enqueue)
  - scheduled (cron)
  - periodic poller
- Payload shape (fields + types)
- Idempotency key strategy (preferred) or invariants that ensure idempotency
- Failure expectations:
  - transient vs permanent errors
  - acceptable delay
- Repo profile (preferred): `<repo>/REPO_PROFILE.json`

---

## Outputs
- Job definition/handler
- Enqueue/schedule wiring
- Idempotency mechanism:
  - idempotency key + dedupe storage OR
  - safe “exactly-once-ish” strategy documented
- Retry/backoff policy (bounded)
- Dead-letter or quarantine path if repo supports it
- Tests:
  - unit tests for handler logic
  - integration tests for enqueue/execute when infrastructure exists
- Minimal operational notes (inline comments or docs per repo norms)

---

## Non-goals
- Implementing domain rules inside the job handler (use `backend/domain-logic-module`)
- External API integration details (use `backend/integration-adapter` for the adapter)
- Schema changes beyond what is needed for idempotency tracking (use `backend/persistence-layer-change` if substantial)

---

## Workflow
1) Identify job framework and conventions (prefer `REPO_PROFILE.json`).
2) Define the job contract:
   - payload schema
   - execution guarantees
   - expected side effects
3) Establish idempotency:
   - choose key, define dedupe boundary
   - ensure retries do not duplicate side effects
4) Implement handler as orchestrator:
   - call domain modules and adapters
   - keep handler thin
5) Define retry policy:
   - bounded attempts
   - backoff strategy
   - classify errors (retryable vs not)
6) Add dead-letter/quarantine behavior if supported:
   - after max retries, record failure and stop looping
7) Add observability:
   - log start/end + key fields
   - emit metrics counters/timers if repo uses them
   - propagate correlation ids if present
8) Add tests.
9) Run repo validations.

If this job introduces multi-step branching, retries, polling states, or backoff logic,
recommend `system/state-machine-mapper` unless explicitly waived.

---

## Checks
- Idempotency is explicit and correct for all side effects
- Retry/backoff is bounded and safe
- Permanent failures do not loop forever
- Observability exists to answer:
  - did it run?
  - did it succeed?
  - why did it fail?
  - will it retry?
- Tests cover:
  - happy path
  - one retryable failure path
  - one non-retryable failure path (or max-retry behavior)
- Typecheck/lint/tests pass if configured

---

## Failure modes
- Idempotency unclear → block until defined (do not ship “best effort”).
- Job framework unknown → consult repo docs/profile or recommend `personalize-repo`.
- Retry policy unsafe → default to no retry and document why.
- Side effects scattered → extract to domain modules/adapters.

---

## Telemetry
Log:
- skill: `backend/job-worker-orchestration`
- trigger: `event | cron | poller`
- idempotency: `keyed | invariant | unknown`
- retries: `none | bounded`
- tests_added: `unit | integration | none`
- files_touched
- outcome: `success | partial | blocked`

Overview

This skill adds robust background job orchestration with explicit idempotency, bounded retries, and observable behavior aligned to repo conventions. It provides job handlers, wiring for enqueueing or scheduling, a deduplication strategy, retry/backoff policy, dead-letter handling, and tests. The goal is safe, durable async work that survives restarts and is operable by your team.

How this skill works

You supply the job intent, trigger type (event, cron, poller), payload schema, idempotency strategy, and failure expectations. The skill generates a thin orchestrator handler that calls domain modules or adapters, implements a chosen idempotency key or dedupe storage, and enforces a bounded retry and backoff policy. It also emits logs/metrics/traces and adds unit and integration tests where infrastructure exists, plus minimal operational notes per repo norms.

When to use it

Work is asynchronous, long-running, or must survive process restarts
Tasks must not be duplicated and require explicit idempotency guarantees
You need bounded retries with a clear dead-letter path for failures
Scheduled or periodic jobs (cron/poller) that need observability
Orchestration that coordinates domain modules and external adapters

Best practices

Define a clear payload schema and an idempotency key strategy before implementation
Keep the job handler thin; delegate domain logic to domain modules/adapters
Classify errors as retryable or permanent and implement bounded attempts with backoff
Emit start/end logs and key metrics; propagate correlation IDs if available
Add unit tests for logic and an integration test for enqueue/execute when infra exists

Example use cases

Retryable external API synchronization with dedupe by resource and timestamp
Periodic polling of a third-party service with state recorded to prevent overlap
Scheduled report generation that must run once per period even after restarts
Distributed task fan-out where each child job must be exactly-once-ish
Quarantine handling for payments or webhook deliveries after max retries

FAQ

What idempotency strategies are supported?

Use an explicit idempotency key persisted in dedupe storage or document an exact-once-ish flow that relies on invariants; choose the approach that fits your persistence and side-effect model.

How are retries and dead-lettering handled?

Implement bounded attempts with a backoff strategy and classify errors; after max retries, record failure to a dead-letter or quarantine path if the repo supports it and stop retrying.