home / skills / rshvr / unofficial-cohere-best-practices / cohere-rerank

cohere-rerank skill

safe

This skill helps optimize two stage retrieval with Cohere rerank, improving semantic search and RAG pipelines across docs.

npx playbooks add skill rshvr/unofficial-cohere-best-practices --skill cohere-rerank

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.7 KB

---
name: cohere-rerank
description: Cohere reranking reference for two-stage retrieval, semantic search improvement, and RAG pipelines. Covers Rerank v4 models, structured data reranking, and LangChain integration.
---

# Cohere Rerank Reference

## Official Resources

- **Docs & Cookbooks**: https://github.com/cohere-ai/cohere-developer-experience
- **API Reference**: https://docs.cohere.com/reference/about

## Models Overview

| Model | Context | Languages | Notes |
|-------|---------|-----------|-------|
| `rerank-v4.0-pro` | 32K tokens | 100+ | Best quality, slower |
| `rerank-v4.0-fast` | 32K tokens | 100+ | Optimized for speed |
| `rerank-v3.5` | 4K tokens | 100+ | Good balance |

## Two-Stage Retrieval Pattern (Recommended)

The proven pattern for production search:
1. **Stage 1**: Fast retrieval (embeddings/BM25) for top 30 candidates
2. **Stage 2**: Precise reranking for final top 10 results

```python
import cohere
co = cohere.ClientV2()

# Stage 1: Cast a wide net with embeddings
candidates = vectorstore.similarity_search(query, k=30)

# Stage 2: Precise reranking narrows to best results
reranked = co.rerank(
    model="rerank-v4.0-fast",
    query=query,
    documents=[doc.page_content for doc in candidates],
    top_n=10
)

final_docs = [candidates[r.index] for r in reranked.results]
```

## Native SDK Reranking

### Basic Reranking
```python
query = "What is machine learning?"
documents = [
    "Machine learning is a subset of AI that enables systems to learn from data.",
    "The weather today is sunny with clear skies.",
    "Deep learning uses neural networks with many layers.",
]

response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
    print(f"Document: {documents[result.index]}\n")
```

### With Return Documents
```python
response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3,
    return_documents=True
)

for result in response.results:
    print(f"Score: {result.relevance_score:.4f}")
    print(f"Text: {result.document.text}\n")
```

## Structured Data Reranking

### JSON/Dict Documents
```python
import yaml

documents = [
    {"title": "ML Guide", "author": "John", "content": "Machine learning basics..."},
    {"title": "Weather Report", "author": "Jane", "content": "Today's forecast..."},
]

yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in documents]

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning tutorial",
    documents=yaml_docs,
    top_n=2
)
```

### Specify Rank Fields
```python
response = co.rerank(
    model="rerank-v3.5",
    query="machine learning",
    documents=[
        {"title": "ML Guide", "author": "John", "text": "Introduction to ML..."},
        {"title": "Weather", "author": "Jane", "text": "Sunny skies..."}
    ],
    rank_fields=["title", "text"]  # Only consider these fields
)
```

## LangChain Integration

### Basic Usage
```python
from langchain_cohere import CohereRerank
from langchain_core.documents import Document

reranker = CohereRerank(model="rerank-v3.5", top_n=3)

documents = [
    Document(page_content="Machine learning is a subset of AI..."),
    Document(page_content="The weather is sunny today..."),
]

reranked = reranker.compress_documents(
    documents=documents,
    query="What is machine learning?"
)

for doc in reranked:
    print(f"Score: {doc.metadata['relevance_score']:.4f}")
```

### With Contextual Compression Retriever
```python
from langchain_cohere import CohereEmbeddings, CohereRerank
from langchain_community.vectorstores import FAISS
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

embeddings = CohereEmbeddings(model="embed-english-v3.0")
vectorstore = FAISS.from_documents(docs, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

reranker = CohereRerank(model="rerank-v3.5", top_n=5)
retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

results = retriever.invoke("Your query here")
```

## Score Interpretation

Relevance scores are normalized to [0, 1]:
- **0.9+**: Highly relevant
- **0.5-0.9**: Moderately relevant
- **<0.5**: Low relevance

```python
threshold = 0.5
relevant = [r for r in response.results if r.relevance_score >= threshold]
```

## Best Practices

1. **Use two-stage retrieval**: Embeddings for recall, rerank for precision
2. **Batch large requests**: Max 10,000 documents per request
3. **Use YAML for structured data**: `yaml.dump(doc, sort_keys=False)` preserves field order
4. **Filter by score threshold**: Don't use low-relevance results

Overview

This skill provides a concise reference and best practices for using Cohere reranking in two-stage retrieval, semantic search improvement, and RAG pipelines. It covers Rerank v4 models, structured-data reranking, and practical LangChain integration patterns. The content focuses on actionable patterns, score interpretation, and performance tuning for production use.

How this skill works

The skill explains a two-stage retrieval pattern: use fast retrieval (embeddings or BM25) to gather a wide set of candidates, then apply Cohere rerank models to precisely reorder and select the top results. It describes model choices (rerank-v4.0-pro, rerank-v4.0-fast, rerank-v3.5), structured-data handling via YAML or selected fields, and how to map reranker outputs back to original documents. Examples include native SDK calls and LangChain components.

When to use it

Improve precision after an embedding or BM25-based recall stage in search or RAG.
Rerank structured JSON/dictionary documents where field-level relevance matters.
Replace or augment similarity scores when final ranking must reflect query-document relevance.
Integrate with LangChain retrievers or contextual compression pipelines.
Optimize for speed or quality by choosing between fast and pro rerank models.

Best practices

Always use a two-stage pipeline: cast a wide net with recall (k ~ 20–50), then rerank to the final top N.
Choose model by tradeoff: rerank-v4.0-pro for highest quality, rerank-v4.0-fast for latency-sensitive cases.
Batch and paginate large workloads; adhere to documented limits (up to 10,000 documents per request when batching).
Serialize structured records to YAML or supply rank_fields to focus the reranker on relevant fields.
Apply a score threshold (for example 0.5) to filter low-relevance results and avoid noisy outputs.

Example use cases

Semantic site search: embeddings retrieve 30 candidates and rerank to surface the best 10 results.
RAG pipelines: rerank retrieved passages before prompt construction to improve answer accuracy.
Product search: rerank product records using title and description fields for better relevance.
Customer support: rank knowledge base articles and filter by relevance score to return top solutions.
Contextual compression: use Cohere reranker as a compressor in LangChain to reduce context while preserving relevance.

FAQ

Which rerank model should I pick for production?

Use rerank-v4.0-pro when quality is critical and latency is acceptable; choose rerank-v4.0-fast when you need lower latency. Use rerank-v3.5 for balanced cost and performance.

How do I rerank structured JSON documents?

Serialize records to YAML or pass objects with rank_fields specified so the reranker only considers relevant fields like title and text.