home / skills / a5c-ai / babysitter / rag-reranking

rag-reranking skill

safe

/plugins/babysitter/skills/babysit/process/specializations/ai-agents-conversational/skills/rag-reranking

This skill enables cross-encoder reranking and MMR diversity filtering to improve retrieval quality and relevance.

npx playbooks add skill a5c-ai/babysitter --skill rag-reranking

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.3 KB

---
name: rag-reranking
description: Cross-encoder reranking and MMR diversity filtering for improved retrieval quality
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash
  - Glob
  - Grep
---

# RAG Reranking Skill

## Capabilities

- Implement cross-encoder reranking models
- Configure Maximal Marginal Relevance (MMR) filtering
- Set up Cohere Rerank integration
- Design multi-stage retrieval pipelines
- Implement diversity-aware reranking
- Configure score normalization and thresholds

## Target Processes

- advanced-rag-patterns
- rag-pipeline-implementation

## Implementation Details

### Reranking Methods

1. **Cross-Encoder Reranking**: Sentence-transformer cross-encoders
2. **Cohere Rerank**: Cohere rerank-v3 API
3. **MMR Reranking**: Diversity-aware result filtering
4. **LLM Reranking**: Using LLM for relevance scoring
5. **Reciprocal Rank Fusion**: Combining multiple retrievers

### Configuration Options

- Reranking model selection
- Top-k after reranking
- MMR lambda (relevance vs diversity)
- Score threshold filtering
- Batch size for reranking

### Best Practices

- Use cross-encoders for quality
- Balance relevance and diversity
- Set appropriate thresholds
- Monitor reranking latency

### Dependencies

- sentence-transformers
- cohere (optional)

Overview

This skill implements cross-encoder reranking and MMR diversity filtering to improve retrieval precision and answer variety in RAG pipelines. It provides configurable reranker selection, score normalization, and thresholds so you can tune relevance versus diversity. The design supports multi-stage pipelines and optional Cohere rerank integration for API-based reranking.

How this skill works

The skill first retrieves candidate passages with one or more retrievers, then applies a cross-encoder or API-based reranker to score each candidate for query relevance. It can normalize scores, apply top-k cutoff and score thresholds, and then run Maximal Marginal Relevance (MMR) to introduce diversity. Reciprocal Rank Fusion or LLM-based reranking are available to combine signals from multiple retrievers when needed.

When to use it

When initial retrieval returns many near-duplicate or overly similar passages
When you need higher per-passage relevance for answer grounding in RAG
To explicitly balance relevance and diversity in multi-document answers
When combining results from multiple retrievers or modalities
When you want to add an API-based rerank step (e.g., Cohere) without replacing local models

Best practices

Use a cross-encoder for the final relevance scoring when latency allows; it offers highest quality
Tune MMR lambda to balance relevance (lambda→1) and diversity (lambda→0) for your use case
Apply score normalization and a minimum threshold to filter low-confidence candidates
Monitor end-to-end latency and set batch sizes to avoid reranking bottlenecks
Combine rerankers with Reciprocal Rank Fusion if you need robustness across retrievers

Example use cases

RAG-based customer support: rerank retrieved knowledge snippets to reduce hallucination risk
Long-document QA: select diverse, highly relevant passages before answer generation
Multi-retriever systems: fuse lexical and dense results and rerank for best evidence
Search relevance tuning: experiment with MMR to surface varied yet relevant results
API-enriched pipelines: call Cohere rerank for large-scale reranking while keeping local fallbacks

FAQ

Do I need Cohere to use this skill?

No. Cohere integration is optional. Local sentence-transformer cross-encoders can be used for high-quality reranking.

How do I balance relevance and diversity?

Adjust the MMR lambda: higher values favor relevance, lower values favor diversity. Experiment on held-out queries and measure downstream answer quality.