home / skills / williamzujkowski / standards / orchestration

This skill guides data engineers in applying orchestration standards for scalable, secure, and maintainable pipeline implementations.

npx playbooks add skill williamzujkowski/standards --skill orchestration

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.0 KB
---
name: orchestration
description: Orchestration standards for orchestration in Data Engineering environments.
---

# Orchestration

> **Quick Navigation:**
> Level 1: [Quick Start](#level-1-quick-start) (5 min) → Level 2: [Implementation](#level-2-implementation) (30 min) → Level 3: [Mastery](#level-3-mastery-resources) (Extended)

---

## Level 1: Quick Start

### Core Principles

1. **Best Practices**: Follow industry-standard patterns for data engineering
2. **Security First**: Implement secure defaults and validate all inputs
3. **Maintainability**: Write clean, documented, testable code
4. **Performance**: Optimize for common use cases

### Essential Checklist

- [ ] Follow established patterns for data engineering
- [ ] Implement proper error handling
- [ ] Add comprehensive logging
- [ ] Write unit and integration tests
- [ ] Document public interfaces

### Quick Links to Level 2

- [Core Concepts](#core-concepts)
- [Implementation Patterns](#implementation-patterns)
- [Common Pitfalls](#common-pitfalls)

---

## Level 2: Implementation

### Core Concepts

This skill covers essential practices for data engineering.

**Key areas include:**

- Architecture patterns
- Implementation best practices
- Testing strategies
- Performance optimization

### Implementation Patterns

Apply these patterns when working with data engineering:

1. **Pattern Selection**: Choose appropriate patterns for your use case
2. **Error Handling**: Implement comprehensive error recovery
3. **Monitoring**: Add observability hooks for production

### Common Pitfalls

Avoid these common mistakes:

- Skipping validation of inputs
- Ignoring edge cases
- Missing test coverage
- Poor documentation

---

## Level 3: Mastery Resources

### Reference Materials

- [Related Standards](../../docs/standards/)
- [Best Practices Guide](../../docs/guides/)

### Templates

See the `templates/` directory for starter configurations.

### External Resources

Consult official documentation and community best practices for data engineering.

Overview

This skill codifies orchestration standards for data engineering environments to help teams start projects correctly and run reliable pipelines. It provides practical patterns, security-focused defaults, and testable templates so engineers can deliver maintainable, performant data workflows. The guidance is concise and aimed at real production scenarios.

How this skill works

The skill inspects orchestration design choices and recommends patterns for scheduling, dependency management, error handling, monitoring, and performance tuning. It surfaces a short checklist and implementation patterns, points out common pitfalls, and links to templates and reference materials for quick adoption. Use the provided patterns to standardize code, logging, testing, and observability across pipelines.

When to use it

  • Bootstrapping a new data pipeline project
  • Choosing orchestration patterns for batch or streaming jobs
  • Hardening production workflows with security and observability
  • Writing tests and documentation for data pipelines
  • Performing a post-mortem or reliability review of orchestration failures

Best practices

  • Select patterns that match the workload: batch, micro-batch, or streaming
  • Enforce input validation and fail-safe error handling paths
  • Instrument pipelines with logs, metrics, and traces for observability
  • Write unit and integration tests for orchestration logic and edge cases
  • Document public interfaces and provide runbook steps for common incidents

Example use cases

  • Start a new ETL project with a tested scheduler template and secure defaults
  • Refactor an unreliable DAG to add retries, backoff, and compensating actions
  • Add monitoring hooks and SLIs to an existing pipeline for faster incident detection
  • Create CI pipelines that run orchestration unit and integration tests before deployment

FAQ

What parts of a pipeline should I test first?

Begin with unit tests for transformation logic, then integration tests for orchestration flows, and finally end-to-end smoke tests for the full pipeline with representative data.

How do I choose between batch and streaming patterns?

Base the choice on latency requirements, data volume, and processing semantics: use batch for throughput and simplicity, streaming for low-latency or real-time needs, and hybrid patterns when both are required.