home / skills / a5c-ai / babysitter / haystack-pipeline

haystack-pipeline skill

safe

/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/haystack-pipeline

This skill helps you configure Haystack pipelines for document processing and QA, including stores, retrievers, readers, and preprocessing with best practices.

npx playbooks add skill a5c-ai/babysitter --skill haystack-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.2 KB

---
name: haystack-pipeline
description: Haystack NLP pipeline configuration for document processing and QA
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash
  - Glob
  - Grep
---

# Haystack Pipeline Skill

## Capabilities

- Configure Haystack pipeline components
- Set up document stores and retrievers
- Implement reader/generator models
- Design custom pipeline graphs
- Configure preprocessing pipelines
- Implement evaluation pipelines

## Target Processes

- rag-pipeline-implementation
- intent-classification-system

## Implementation Details

### Core Components

1. **DocumentStores**: Elasticsearch, Weaviate, FAISS, etc.
2. **Retrievers**: BM25, Dense, Hybrid
3. **Readers/Generators**: Extractive and generative QA
4. **Preprocessors**: Document cleaning and splitting

### Pipeline Types

- Retrieval pipelines
- RAG pipelines
- Evaluation pipelines
- Indexing pipelines

### Configuration Options

- Component selection
- Pipeline graph design
- Document store backend
- Model selection
- Preprocessing settings

### Best Practices

- Modular pipeline design
- Proper preprocessing
- Evaluation integration
- Component versioning

### Dependencies

- haystack-ai
- farm-haystack (legacy)

Overview

This skill configures Haystack NLP pipelines for document processing and question-answering workflows. It helps assemble document stores, retrievers, and reader/generator models into modular, reproducible pipelines tailored for RAG, retrieval, and evaluation tasks. The skill emphasizes practical configuration, preprocessing, and evaluation integration for production-grade NLP stacks.

How this skill works

You define and connect core Haystack components: a document store backend, retriever(s), reader or generator, and optional preprocessors. The skill builds pipeline graphs for retrieval, RAG, indexing, or evaluation, and exposes configuration knobs for component selection, model choice, and preprocessing rules. It supports modular, versioned pipelines that can be evaluated and iterated deterministically.

When to use it

Building a retrieval-augmented generation (RAG) service for QA over internal documents
Indexing and searching large document collections with a chosen backend (FAISS, Weaviate, Elasticsearch)
Prototyping or deploying hybrid retrieval (BM25 + dense) systems
Creating evaluation pipelines to compare reader/generator model performance
Integrating robust preprocessing (cleaning, splitting) before indexing or retrieval

Best practices

Design modular pipeline graphs so components can be swapped without changing the whole workflow
Normalize and split documents during preprocessing to improve retrieval relevance
Use hybrid retrievers (BM25 + dense) when semantic and lexical signals matter
Version component configurations (models, retrievers, stores) to reproduce experiments
Integrate evaluation early using held-out queries and metrics to prevent regressions

Example use cases

Implement a RAG pipeline that retrieves passages from Weaviate and generates answers with a generative model
Set up an indexing pipeline that preprocesses PDFs, extracts text, and stores embeddings in FAISS
Create an intent-classification augmentation where retrieved context improves classification accuracy
Run evaluation pipelines to compare extractive readers versus generative answerers on a QA benchmark
Deploy a production retrieval service with Elasticsearch backend and BM25 fallback

FAQ

Which document stores are supported?

Common backends include Elasticsearch, Weaviate, FAISS, and other Haystack-compatible stores; choose based on scale and retrieval latency needs.

Should I use an extractive reader or a generative model?

Use extractive readers for precise span-based answers and generative models when fluency and synthesis across passages are required; evaluation should guide the final choice.