home / skills / eyadsibai / ltk / rag-frameworks

rag-frameworks skill

safe

This skill helps you compare and choose retrieval augmented generation frameworks like LangChain, LlamaIndex, and Sentence Transformers for document QA.

npx playbooks add skill eyadsibai/ltk --skill rag-frameworks

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.1 KB

---
name: rag-frameworks
description: Use when "RAG", "retrieval augmented generation", "LangChain", "LlamaIndex", "sentence transformers", "embeddings", "document QA", "chatbot with documents", "semantic search"
version: 1.0.0
---

# RAG Frameworks

Frameworks for building retrieval-augmented generation applications.

## Comparison

| Framework | Best For | Learning Curve | Flexibility |
|-----------|----------|----------------|-------------|
| **LangChain** | Agents, chains, tools | Steeper | Highest |
| **LlamaIndex** | Data indexing, simple RAG | Gentle | Medium |
| **Sentence Transformers** | Custom embeddings | Low | High |

---

## LangChain

Orchestration framework for building complex LLM applications.

**Core concepts:**

- **Chains**: Sequential operations (retrieve → prompt → generate)
- **Agents**: LLM decides which tools to use
- **LCEL**: Declarative pipeline syntax with `|` operator
- **Retrievers**: Abstract interface to vector stores

**Strengths**: Rich ecosystem, many integrations, agent capabilities
**Limitations**: Abstractions can be confusing, rapid API changes

**Key concept**: LCEL (LangChain Expression Language) for composable pipelines.

---

## LlamaIndex

Data framework focused on connecting LLMs to external data.

**Core concepts:**

- **Documents → Nodes**: Automatic chunking and indexing
- **Index types**: Vector, keyword, tree, knowledge graph
- **Query engines**: Retrieve and synthesize answers
- **Chat engines**: Stateful conversation over data

**Strengths**: Simple API, great for document QA, data connectors
**Limitations**: Less flexible for complex agent workflows

**Key concept**: "Load data, index it, query it" - simpler mental model than LangChain.

---

## Sentence Transformers

Generate high-quality embeddings for semantic similarity.

**Popular models:**

| Model | Dimensions | Quality | Speed |
|-------|------------|---------|-------|
| all-MiniLM-L6-v2 | 384 | Good | Fast |
| all-mpnet-base-v2 | 768 | Better | Medium |
| e5-large-v2 | 1024 | Best | Slow |

**Key concept**: Bi-encoder architecture - encode query and documents separately, compare with cosine similarity.

---

## RAG Architecture Patterns

| Pattern | Description | When to Use |
|---------|-------------|-------------|
| **Naive RAG** | Retrieve top-k, stuff in prompt | Simple QA |
| **Parent-Child** | Retrieve chunks, return parent docs | Context preservation |
| **Hybrid Search** | Vector + keyword search | Better recall |
| **Re-ranking** | Retrieve many, re-rank with cross-encoder | Higher precision |
| **Query Expansion** | Generate variations of query | Ambiguous queries |

---

## Decision Guide

| Scenario | Recommendation |
|----------|----------------|
| Simple document QA | LlamaIndex |
| Complex agents/tools | LangChain |
| Custom embedding pipeline | Sentence Transformers |
| Production RAG | LangChain or custom |
| Quick prototype | LlamaIndex |
| Maximum control | Build custom with Sentence Transformers |

## Resources

- LangChain: <https://python.langchain.com>
- LlamaIndex: <https://docs.llamaindex.ai>
- Sentence Transformers: <https://sbert.net>

Overview

This skill compares and guides selection among popular RAG frameworks for building retrieval-augmented generation systems. It highlights LangChain, LlamaIndex, and Sentence Transformers, and maps common RAG architecture patterns to practical scenarios. Use it to pick the right framework and pattern for document QA, chatbots, and production RAG pipelines.

How this skill works

The skill summarizes core concepts and strengths of each framework: LangChain for orchestration and agents, LlamaIndex for data indexing and document QA, and Sentence Transformers for producing embeddings. It also describes RAG patterns (naive, parent-child, hybrid, re-ranking, query expansion) and gives concrete recommendations based on project needs and constraints.

When to use it

Choosing a framework for document question answering or chat over documents
Deciding between agent-driven workflows and simple retrieval + generation pipelines
Designing an embedding strategy or selecting pre-trained embedding models
Picking RAG architecture patterns for recall vs precision trade-offs
Rapid prototyping versus production-ready orchestration and control

Best practices

Start with a simple retrieval pattern (naive RAG) and iterate to hybrid, re-ranking, or query-expansion as needs for precision or recall increase
Measure both retrieval recall and end-to-end answer quality; use re-ranking with a cross-encoder when precision matters
Prefer smaller, faster embedding models for latency-sensitive prototypes and larger, higher-quality models for production relevance
Keep indexing and retrieval separate from generation so you can swap components (vector store, embeddings, LLM) independently

Example use cases

Build a document chatbot that maintains state over a knowledge base using LlamaIndex chat engines
Implement an agent that chains retrieval, tools, and generation for task automation with LangChain
Create a custom embedding pipeline with Sentence Transformers for domain-specific semantic search
Improve an FAQ system by adding hybrid vector + keyword search and a re-ranking step for higher precision
Prototype a quick RAG demo with LlamaIndex, then migrate to LangChain or a custom stack for production

FAQ

Which framework is best for quick prototypes?

LlamaIndex is typically fastest to get a basic document QA or chatbot working due to its simple load-index-query workflow.

When should I add a re-ranker?

Add re-ranking when retrieval returns many candidates and you need higher precision in the final answers; use a cross-encoder for best results.

Which embedding model should I choose?

Start with all-MiniLM-L6-v2 for speed and low cost, move to all-mpnet-base-v2 or e5-large-v2 for higher quality when needed.