home / skills / athola / claude-night-market / architecture-paradigm-pipeline

architecture-paradigm-pipeline skill

safe

/plugins/archetypes/skills/architecture-paradigm-pipeline

This skill guides you to design data pipelines using the Pipes and Filters paradigm, enabling fixed transformations and independent, observable stages.

npx playbooks add skill athola/claude-night-market --skill architecture-paradigm-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.3 KB

---
name: architecture-paradigm-pipeline
description: 'Consult this skill when designing data pipelines or transformation workflows.
  Use when data flows through fixed sequence of transformations, stages can be independently
  developed and tested, parallel processing of stages is beneficial. Do not use when
  selecting from multiple paradigms - use architecture-paradigms first. DO NOT use
  when: data flow is not sequential or predictable. DO NOT use when: complex branching/merging
  logic dominates.'
category: architectural-pattern
tags:
- architecture
- pipeline
- pipes-filters
- ETL
- streaming
- data-processing
dependencies: []
tools:
- stream-processor
- message-queue
- data-validator
usage_patterns:
- paradigm-implementation
- data-transformation
- workflow-automation
complexity: medium
estimated_tokens: 700
---

# The Pipeline (Pipes and Filters) Paradigm

## When to Employ This Paradigm
- When data must flow through a fixed sequence of discrete transformations, such as in ETL jobs, streaming analytics, or CI/CD pipelines.
- When reusing individual processing stages is needed, either independently or to scale bottleneck stages separately from others.
- When failure isolation between stages is a critical requirement.

## Adoption Steps
1. **Define Filters**: Design each stage (filter) to perform a single, well-defined transformation. Each filter must have a clear input and output data schema.
2. **Connect via Pipes**: Connect the filters using "pipes," which can be implemented as streams, message queues, or in-memory channels. validate these pipes support back-pressure and buffering.
3. **Maintain Stateless Filters**: Where possible, design filters to be stateless. Any required state should be persisted externally or managed at the boundaries of the pipeline.
4. **Instrument Each Stage**: Implement monitoring for each filter to track key metrics such as latency, throughput, and error rates.
5. **Orchestrate Deployments**: Design the deployment strategy to allow each stage to be scaled horizontally and upgraded independently.

## Key Deliverables
- An Architecture Decision Record (ADR) documenting the filters, the chosen pipe technology, the error-handling strategy, and the tools for replaying data.
- A suite of contract tests for each filter, plus integration tests that cover representative end-to-end pipeline executions.
- Observability dashboards that visualize stage-level Key Performance Indicators (KPIs).

## Risks & Mitigations
- **Single-Stage Bottlenecks**:
  - **Mitigation**: Implement auto-scaling for individual filters. If a single filter remains a bottleneck, consider refactoring it into a more granular sub-pipeline.
- **Schema Drift Between Stages**:
  - **Mitigation**: Centralize schema definitions in a shared repository and enforce compatibility tests as part of the CI/CD process to prevent breaking changes.
- **Back-Pressure Failures**:
  - **Mitigation**: Conduct rigorous load testing to simulate high-volume scenarios. Validate that buffering, retry logic, and back-pressure mechanisms behave as expected under stress.
## Troubleshooting

### Common Issues

**Command not found**
Ensure all dependencies are installed and in PATH

**Permission errors**
Check file permissions and run with appropriate privileges

**Unexpected behavior**
Enable verbose logging with `--verbose` flag

Overview

This skill helps design and validate data pipelines using the Pipes-and-Filters (pipeline) architectural paradigm. It focuses on building sequences of discrete, testable transformations connected by reliable pipes so stages can be developed, scaled, and observed independently. Use this skill to produce ADRs, contract tests, and deployment guidance for sequential transformation workflows.

How this skill works

The skill guides you to define each processing stage as a single filter with a clear input/output schema, then connect filters using pipes implemented as streams, message queues, or in-memory channels. It prescribes stateless filter design where possible, instrumentation per stage for latency/throughput/error metrics, and an orchestration approach that enables independent scaling and deployment. It also recommends CI checks for schema compatibility and tools for replaying data to aid recovery and testing.

When to use it

When data flows through a fixed, predictable sequence of transformations (ETL, streaming analytics).
When you need to independently develop, test, or scale pipeline stages.
When failure isolation between stages is important for reliability and recovery.
When observable stage-level KPIs are required to diagnose performance and errors.

Best practices

Define each filter to perform a single, well-scoped transformation and publish clear input/output schemas.
Use pipes that support buffering and back-pressure (message queues, streams, channels) and validate them under load.
Keep filters stateless where possible; persist state externally or at pipeline boundaries when needed.
Implement contract tests per filter and end-to-end integration tests that exercise representative data flows.
Instrument every stage for latency, throughput, and error rates and build dashboards for stage-level KPIs.
Design deployment and scaling so individual stages can be upgraded or autoscaled without disrupting the whole pipeline.

Example use cases

Batch ETL: extract -> transform -> validate -> load where each step is an independently testable filter.
Streaming analytics: sensor data flows through parsing, enrichment, aggregation, and alerting stages connected by a durable stream.
CI/CD artifact pipeline: lint -> test -> package -> sign -> publish, with stage-level retries and observability.
Media processing: ingest -> transcode -> thumbnail -> index, enabling per-stage scaling to handle encoding spikes.

FAQ

When should I avoid the pipeline paradigm?

Avoid it when data flow is unpredictable, non-sequential, or dominated by complex branching/merging logic. In those cases a different paradigm (event-driven, workflow/orchestration) is a better fit.

How do I handle schema changes between stages?

Centralize schema definitions in a shared repo, enforce compatibility via CI contract tests, and support versioning and backward-compatible transforms to prevent breaking consumers.

How do I mitigate single-stage bottlenecks?

Enable autoscaling for the affected filter, split the filter into smaller sub-filters if possible, and use back-pressure and buffering strategies to smooth bursts.