home / skills / rshvr / unofficial-cohere-best-practices / cohere-embeddings

cohere-embeddings skill

safe

This skill helps you optimize vector search and RAG with Cohere embeddings, covering embed-v4 and input_type best practices for accurate retrieval.

npx playbooks add skill rshvr/unofficial-cohere-best-practices --skill cohere-embeddings

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.7 KB

---
name: cohere-embeddings
description: Cohere embeddings reference for vector search, semantic similarity, and RAG. Covers Embed v4 (multimodal, Matryoshka dimensions), input types (CRITICAL for search quality), batch processing, and LangChain integration.
---

# Cohere Embeddings Reference

## Official Resources

- **Docs & Cookbooks**: https://github.com/cohere-ai/cohere-developer-experience
- **API Reference**: https://docs.cohere.com/reference/about

## Models Overview

| Model | Context | Dimensions | Features |
|-------|---------|------------|----------|
| `embed-v4.0` | 128K tokens | 256/512/1024/1536 | Multimodal (text+image), Matryoshka |
| `embed-english-v3.0` | 512 tokens | 1024 | English-only, fast |
| `embed-multilingual-v3.0` | 512 tokens | 1024 | 100+ languages |
| `embed-english-light-v3.0` | 512 tokens | 384 | Lightweight, fastest |

## Input Types (CRITICAL)

> **Using the wrong `input_type` will silently degrade search quality.** Cohere uses asymmetric embeddings where documents and queries are embedded differently.

| Input Type | Use Case |
|------------|----------|
| `search_document` | Documents stored in vector DB for retrieval |
| `search_query` | User queries searching against documents |
| `classification` | Text classification tasks |
| `clustering` | Clustering similar documents |
| `image` | Image inputs (Embed v4 only) |

### Example: Search Pipeline
```python
import cohere
co = cohere.ClientV2()

# INDEXING: Use search_document for docs you're storing
doc_response = co.embed(
    model="embed-english-v3.0",
    texts=documents,
    input_type="search_document"  # MUST use for storage
)

# QUERYING: Use search_query for user queries
query_response = co.embed(
    model="embed-english-v3.0",
    texts=[user_query],
    input_type="search_query"  # MUST use for retrieval
)
```

## Native SDK Embeddings

### Basic Text Embedding
```python
response = co.embed(
    model="embed-english-v3.0",
    texts=["Hello world", "Machine learning is cool"],
    input_type="search_document"
)

embeddings = response.embeddings.float_
print(f"Embedding shape: {len(embeddings)} x {len(embeddings[0])}")
```

### Embed v4 with Matryoshka Dimensions
```python
# High precision (default)
response = co.embed(
    model="embed-v4.0",
    texts=["text"],
    input_type="search_document",
    output_dimension=1536
)

# Balanced (3x faster search)
response = co.embed(
    model="embed-v4.0",
    texts=["text"],
    input_type="search_document",
    output_dimension=512
)

# Compact (6x faster search)
response = co.embed(
    model="embed-v4.0",
    texts=["text"],
    input_type="search_document",
    output_dimension=256
)
```

### Different Embedding Types
```python
response = co.embed(
    model="embed-english-v3.0",
    texts=["Hello"],
    input_type="search_document",
    embedding_types=["float", "int8", "uint8", "binary", "ubinary"]
)

float_emb = response.embeddings.float_
int8_emb = response.embeddings.int8
binary_emb = response.embeddings.binary
```

## Multimodal Embeddings (Embed v4)

### Image Embeddings
```python
import base64

with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode()

image_uri = f"data:image/jpeg;base64,{image_base64}"

response = co.embed(
    model="embed-v4.0",
    images=[image_uri],
    input_type="image"
)
```

### Mixed Content
```python
response = co.embed(
    model="embed-v4.0",
    inputs=[
        {"text": "A description of the product"},
        {"image": image_uri},
        {"text": "Another text chunk"}
    ],
    input_type="search_document"
)
```

## Batch Processing

### Hard Limit: 96 Items Per Request
```python
def embed_in_batches(texts: list, batch_size: int = 96):
    """Embed texts in batches of 96 (Cohere API limit)."""
    all_embeddings = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = co.embed(
            model="embed-english-v3.0",
            texts=batch,
            input_type="search_document"
        )
        all_embeddings.extend(response.embeddings.float_)

    return all_embeddings
```

### Embed Jobs API (Large Datasets)
```python
job = co.embed_jobs.create(
    model="embed-english-v3.0",
    dataset_id="your-dataset-id",
    input_type="search_document"
)

status = co.embed_jobs.get(job.job_id)
print(status.status)  # "processing", "complete", "failed"
```

## LangChain Integration

### Basic Usage
```python
from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(model="embed-english-v3.0")

vector = embeddings.embed_query("What is machine learning?")
vectors = embeddings.embed_documents(["Document 1", "Document 2"])
```

### With Vector Store
```python
from langchain_cohere import CohereEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

embeddings = CohereEmbeddings(model="embed-english-v3.0")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(your_documents)

vectorstore = FAISS.from_documents(docs, embeddings)
results = vectorstore.similarity_search("your query", k=5)
```

## Best Practices

1. **Match input types**: Always use `search_document` for stored docs and `search_query` for queries
2. **Batch efficiently**: Hard limit of 96 texts per request
3. **Choose dimensions wisely**: Lower dimensions = faster search but slightly less precision
4. **Chunk long texts**: Consider chunking at ~6000 chars (texts auto-truncate at 8K)

```python
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=6000,
    chunk_overlap=200
)
chunks = splitter.split_text(long_document)
```

Overview

This skill explains Cohere embeddings for vector search, semantic similarity, and retrieval-augmented generation (RAG). It covers Embed v4 (multimodal with Matryoshka dimensions), correct input_type usage, batch processing limits, and practical LangChain integration. The guidance focuses on getting reliable search quality and efficient embedding pipelines.

How this skill works

The skill inspects model choices, input_type semantics, output dimensions, and batching behavior to produce embeddings suited to specific tasks. It highlights that Cohere uses asymmetric embeddings—documents and queries must use different input_type values—to avoid silent quality degradation. It also explains embed-v4 multimodal capabilities and how Matryoshka dimensions trade precision for speed.

When to use it

Building a document search or semantic retrieval system
Implementing RAG for a chatbot or knowledge assistant
Comparing embeddings for multilingual vs. English-only workloads
Embedding images or mixed text+image content with embed-v4
Processing large datasets with batch limits or embed jobs API

Best practices

Always set input_type: use search_document for stored docs and search_query for queries
Respect the 96-items-per-request hard limit; batch or use embed jobs for large datasets
Choose output_dimension based on precision vs. latency: 1536 (high), 512 (balanced), 256 (compact)
Chunk long texts (~6000 chars recommended) to avoid truncation and preserve context
Use appropriate embedding_types (float, int8, binary) to optimize storage and indexing
Integrate with LangChain vectorstores and splitters to streamline ingestion and retrieval

Example use cases

Indexing a knowledge base: embed documents with input_type=search_document, store in FAISS, then query with search_query embeddings
Multimodal product search: embed product images and descriptions together using embed-v4 to enable image+text retrieval
High-throughput pipelines: embed in batches of 96 or submit an embed_jobs job for millions of items
RAG for customer support: chunk long manuals, embed chunks, and use similarity search to supply context to a generative model
Language-aware applications: pick embed-english-v3.0 for English-only or embed-multilingual-v3.0 for global content

FAQ

What happens if I use the wrong input_type?

Using the wrong input_type will silently reduce search quality because document and query embeddings are asymmetric. Always use search_document for indexed content and search_query for queries.

How do I choose embed-v4 dimensions?

Pick 1536 for highest precision, 512 for a balance of speed and accuracy, and 256 for compact, faster search depending on your latency and accuracy needs.