home / skills / rshvr / unofficial-cohere-best-practices / cohere-rerank

cohere-rerank skill

/skills/cohere-rerank

This skill helps optimize two stage retrieval with Cohere rerank, improving semantic search and RAG pipelines across docs.

npx playbooks add skill rshvr/unofficial-cohere-best-practices --skill cohere-rerank

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.7 KB
---
name: cohere-rerank
description: Cohere reranking reference for two-stage retrieval, semantic search improvement, and RAG pipelines. Covers Rerank v4 models, structured data reranking, and LangChain integration.
---

# Cohere Rerank Reference

## Official Resources

- **Docs & Cookbooks**: https://github.com/cohere-ai/cohere-developer-experience
- **API Reference**: https://docs.cohere.com/reference/about

## Models Overview

| Model | Context | Languages | Notes |
|-------|---------|-----------|-------|
| `rerank-v4.0-pro` | 32K tokens | 100+ | Best quality, slower |
| `rerank-v4.0-fast` | 32K tokens | 100+ | Optimized for speed |
| `rerank-v3.5` | 4K tokens | 100+ | Good balance |

## Two-Stage Retrieval Pattern (Recommended)

The proven pattern for production search:
1. **Stage 1**: Fast retrieval (embeddings/BM25) for top 30 candidates
2. **Stage 2**: Precise reranking for final top 10 results

```python
import cohere
co = cohere.ClientV2()

# Stage 1: Cast a wide net with embeddings
candidates = vectorstore.similarity_search(query, k=30)

# Stage 2: Precise reranking narrows to best results
reranked = co.rerank(
    model="rerank-v4.0-fast",
    query=query,
    documents=[doc.page_content for doc in candidates],
    top_n=10
)

final_docs = [candidates[r.index] for r in reranked.results]
```

## Native SDK Reranking

### Basic Reranking
```python
query = "What is machine learning?"
documents = [
    "Machine learning is a subset of AI that enables systems to learn from data.",
    "The weather today is sunny with clear skies.",
    "Deep learning uses neural networks with many layers.",
]

response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
    print(f"Document: {documents[result.index]}\n")
```

### With Return Documents
```python
response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3,
    return_documents=True
)

for result in response.results:
    print(f"Score: {result.relevance_score:.4f}")
    print(f"Text: {result.document.text}\n")
```

## Structured Data Reranking

### JSON/Dict Documents
```python
import yaml

documents = [
    {"title": "ML Guide", "author": "John", "content": "Machine learning basics..."},
    {"title": "Weather Report", "author": "Jane", "content": "Today's forecast..."},
]

yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in documents]

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning tutorial",
    documents=yaml_docs,
    top_n=2
)
```

### Specify Rank Fields
```python
response = co.rerank(
    model="rerank-v3.5",
    query="machine learning",
    documents=[
        {"title": "ML Guide", "author": "John", "text": "Introduction to ML..."},
        {"title": "Weather", "author": "Jane", "text": "Sunny skies..."}
    ],
    rank_fields=["title", "text"]  # Only consider these fields
)
```

## LangChain Integration

### Basic Usage
```python
from langchain_cohere import CohereRerank
from langchain_core.documents import Document

reranker = CohereRerank(model="rerank-v3.5", top_n=3)

documents = [
    Document(page_content="Machine learning is a subset of AI..."),
    Document(page_content="The weather is sunny today..."),
]

reranked = reranker.compress_documents(
    documents=documents,
    query="What is machine learning?"
)

for doc in reranked:
    print(f"Score: {doc.metadata['relevance_score']:.4f}")
```

### With Contextual Compression Retriever
```python
from langchain_cohere import CohereEmbeddings, CohereRerank
from langchain_community.vectorstores import FAISS
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

embeddings = CohereEmbeddings(model="embed-english-v3.0")
vectorstore = FAISS.from_documents(docs, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

reranker = CohereRerank(model="rerank-v3.5", top_n=5)
retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

results = retriever.invoke("Your query here")
```

## Score Interpretation

Relevance scores are normalized to [0, 1]:
- **0.9+**: Highly relevant
- **0.5-0.9**: Moderately relevant
- **<0.5**: Low relevance

```python
threshold = 0.5
relevant = [r for r in response.results if r.relevance_score >= threshold]
```

## Best Practices

1. **Use two-stage retrieval**: Embeddings for recall, rerank for precision
2. **Batch large requests**: Max 10,000 documents per request
3. **Use YAML for structured data**: `yaml.dump(doc, sort_keys=False)` preserves field order
4. **Filter by score threshold**: Don't use low-relevance results

Overview

This skill provides a concise reference and best practices for using Cohere reranking in two-stage retrieval, semantic search improvement, and RAG pipelines. It covers Rerank v4 models, structured-data reranking, and practical LangChain integration patterns. The content focuses on actionable patterns, score interpretation, and performance tuning for production use.

How this skill works

The skill explains a two-stage retrieval pattern: use fast retrieval (embeddings or BM25) to gather a wide set of candidates, then apply Cohere rerank models to precisely reorder and select the top results. It describes model choices (rerank-v4.0-pro, rerank-v4.0-fast, rerank-v3.5), structured-data handling via YAML or selected fields, and how to map reranker outputs back to original documents. Examples include native SDK calls and LangChain components.

When to use it

  • Improve precision after an embedding or BM25-based recall stage in search or RAG.
  • Rerank structured JSON/dictionary documents where field-level relevance matters.
  • Replace or augment similarity scores when final ranking must reflect query-document relevance.
  • Integrate with LangChain retrievers or contextual compression pipelines.
  • Optimize for speed or quality by choosing between fast and pro rerank models.

Best practices

  • Always use a two-stage pipeline: cast a wide net with recall (k ~ 20–50), then rerank to the final top N.
  • Choose model by tradeoff: rerank-v4.0-pro for highest quality, rerank-v4.0-fast for latency-sensitive cases.
  • Batch and paginate large workloads; adhere to documented limits (up to 10,000 documents per request when batching).
  • Serialize structured records to YAML or supply rank_fields to focus the reranker on relevant fields.
  • Apply a score threshold (for example 0.5) to filter low-relevance results and avoid noisy outputs.

Example use cases

  • Semantic site search: embeddings retrieve 30 candidates and rerank to surface the best 10 results.
  • RAG pipelines: rerank retrieved passages before prompt construction to improve answer accuracy.
  • Product search: rerank product records using title and description fields for better relevance.
  • Customer support: rank knowledge base articles and filter by relevance score to return top solutions.
  • Contextual compression: use Cohere reranker as a compressor in LangChain to reduce context while preserving relevance.

FAQ

Which rerank model should I pick for production?

Use rerank-v4.0-pro when quality is critical and latency is acceptable; choose rerank-v4.0-fast when you need lower latency. Use rerank-v3.5 for balanced cost and performance.

How do I rerank structured JSON documents?

Serialize records to YAML or pass objects with rank_fields specified so the reranker only considers relevant fields like title and text.