home / skills / omer-metin / skills-for-antigravity / rag-implementation

rag-implementation skill

/skills/rag-implementation

This skill helps you implement retrieval augmented generation patterns with robust chunking, hybrid search, and reranking to deliver relevant info to LLMs.

npx playbooks add skill omer-metin/skills-for-antigravity --skill rag-implementation

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.9 KB
---
name: rag-implementation
description: Retrieval-Augmented Generation patterns including chunking, embeddings, vector stores, and retrieval optimizationUse when "rag, retrieval augmented, vector search, embeddings, semantic search, document qa, rag, retrieval, embeddings, vector, search, llm" mentioned. 
---

# Rag Implementation

## Identity

You're a RAG specialist who has built systems serving millions of queries over
terabytes of documents. You've seen the naive "chunk and embed" approach fail,
and developed sophisticated chunking, retrieval, and reranking strategies.

You understand that RAG is not just vector search—it's about getting the right
information to the LLM at the right time. You know when RAG helps and when
it's unnecessary overhead.

Your core principles:
1. Chunking is critical—bad chunks mean bad retrieval
2. Hybrid search wins—combine dense and sparse retrieval
3. Rerank for quality—top-k isn't top-relevance
4. Evaluate continuously—retrieval quality degrades silently
5. Consider the alternative—sometimes caching beats RAG


## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill captures production-grade Retrieval-Augmented Generation (RAG) patterns for building reliable document search and QA systems. It focuses on chunking, embedding strategy, vector stores, hybrid retrieval, reranking, and continuous evaluation to keep retrieval accurate at scale. Use it to design, diagnose, or validate RAG pipelines with practical trade-offs and failure modes in mind.

How this skill works

The implementation prescribes principled chunking (content-aware, overlapping windows), embedding choices tied to chunk semantics, and storage in vector databases with metadata for fast filtering. Retrieval combines dense vector search with sparse signals for recall, followed by reranking to prioritize precision. Continuous monitoring and validation are built in to detect drift, stale embeddings, and retrieval decay.

When to use it

  • Building document QA or semantic search over large, heterogeneous corpora
  • When naive chunk-and-embed leads to low precision or hallucinations
  • Scaling to millions of queries or terabytes of content
  • When you need reproducible retrieval quality and explainability
  • To benchmark trade-offs between caching and live retrieval

Best practices

  • Chunk by semantic boundaries, include small overlaps, and keep chunks coherent to preserve context
  • Use hybrid retrieval: combine sparse (BM25) for recall and dense vectors for semantic match
  • Rerank top candidates with a cross-encoder or lightweight scoring model before feeding the LLM
  • Version embeddings and run continuous validation to detect silent quality degradation
  • Prefer metadata filters and provenance tracking to reduce hallucination risk

Example use cases

  • Customer support: semantic search over knowledge bases with reranked answers to reduce incorrect responses
  • Legal and compliance: precise retrieval across contracts using hybrid search and strict filtering
  • Product documentation: scalable document QA that maintains context with smart chunking
  • Search augmentation: blend vector results with keyword matches to improve recall
  • Analytics: evaluate retrieval impact on end-to-end LLM answer quality with A/B tests

FAQ

Is RAG always the right choice?

No. For low-latency, highly repetitive queries, caching or precomputed responses can outperform RAG. Use RAG when freshness, scale, or semantic flexibility are required.

How large should chunks be?

Chunk size depends on document structure and model context window. Aim for semantically coherent units that fit comfortably in the model input with some overlap; validate empirically.