home / skills / rshvr / unofficial-cohere-best-practices / cohere-langchain

cohere-langchain skill

safe

This skill helps you build and optimize Cohere LangChain RAG pipelines and tool workflows by providing integrated chat, embeddings, and retriever examples.

npx playbooks add skill rshvr/unofficial-cohere-best-practices --skill cohere-langchain

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.1 KB

---
name: cohere-langchain
description: Cohere LangChain integration reference for ChatCohere, CohereEmbeddings, CohereRerank, and CohereRagRetriever. Use for building RAG pipelines, chains, and tool-calling workflows with LangChain.
---

# Cohere LangChain Integration Reference

## Official Resources

- **Docs & Cookbooks**: https://github.com/cohere-ai/cohere-developer-experience
- **API Reference**: https://docs.cohere.com/reference/about

> **Model Compatibility**: Command A Reasoning and Command A Vision are **not supported** in LangChain. Use the native Cohere SDK for these models.

## Installation

```bash
pip install langchain-cohere langchain langchain-core
```

## Import Map (v0.5+)
```python
from langchain_cohere import (
    ChatCohere,
    CohereEmbeddings,
    CohereRerank,
    CohereRagRetriever,
    create_cohere_react_agent
)
# NOT from langchain_community (deprecated)
```

## ChatCohere

### Basic Usage
```python
from langchain_cohere import ChatCohere
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatCohere(model="command-a-03-2025")

response = llm.invoke([
    HumanMessage(content="What is machine learning?")
])
print(response.content)
```

### Streaming
```python
for chunk in llm.stream([HumanMessage(content="Write a poem")]):
    print(chunk.content, end="", flush=True)
```

### For Agents (Recommended Settings)
```python
llm = ChatCohere(
    model="command-a-03-2025",
    temperature=0.3,  # Critical for reliable tool calling
    max_tokens=4096
)
```

### With Prompt Templates
```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}."),
    ("human", "{input}")
])

chain = prompt | llm | StrOutputParser()
result = chain.invoke({"role": "helpful assistant", "input": "What is Python?"})
```

## CohereEmbeddings

```python
from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(model="embed-english-v3.0")

query_vector = embeddings.embed_query("What is AI?")
doc_vectors = embeddings.embed_documents(["First document", "Second document"])
```

### With Vector Store
```python
from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_texts(texts, embeddings)
results = vectorstore.similarity_search("query", k=5)
```

## CohereRerank

```python
from langchain_cohere import CohereRerank
from langchain_core.documents import Document

reranker = CohereRerank(model="rerank-v3.5", top_n=3)

docs = [
    Document(page_content="ML is a subset of AI..."),
    Document(page_content="Weather is sunny..."),
]

reranked = reranker.compress_documents(docs, query="What is ML?")
```

### With Contextual Compression Retriever
```python
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
reranker = CohereRerank(model="rerank-v3.5", top_n=5)

retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

results = retriever.invoke("Your query")
```

## Tool Calling

```python
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get weather for a location."""
    return f"Weather in {location}: 20°C, sunny"

llm = ChatCohere(model="command-a-03-2025")
llm_with_tools = llm.bind_tools([get_weather])

response = llm_with_tools.invoke("What's the weather in Toronto?")

if response.tool_calls:
    for tc in response.tool_calls:
        print(f"Tool: {tc['name']}, Args: {tc['args']}")
```

## Structured Output

```python
from pydantic import BaseModel, Field

class Person(BaseModel):
    name: str = Field(description="Person's name")
    age: int = Field(description="Person's age")

llm = ChatCohere(model="command-a-03-2025")
structured_llm = llm.with_structured_output(Person)

result = structured_llm.invoke("John is 30 years old")
print(result)  # Person(name='John', age=30)
```

## Full RAG Chain Example

```python
from langchain_cohere import ChatCohere, CohereEmbeddings, CohereRerank
from langchain_community.vectorstores import FAISS
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Setup
embeddings = CohereEmbeddings(model="embed-english-v3.0")
vectorstore = FAISS.from_texts(your_texts, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

reranker = CohereRerank(model="rerank-v3.5", top_n=5)
retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

llm = ChatCohere(model="command-a-03-2025")

prompt = ChatPromptTemplate.from_template("""
Answer based on context:
Context: {context}
Question: {question}
""")

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("Your question here")
```

Overview

This skill provides a practical Cohere + LangChain integration reference for ChatCohere, CohereEmbeddings, CohereRerank, and CohereRagRetriever. It shows how to build RAG pipelines, agent tool-calling workflows, streaming responses, structured outputs, and best-practice settings for reliable tool use. Use it to accelerate building production-ready retrieval-augmented and agent-based apps with Cohere models in LangChain.

How this skill works

The skill documents the key LangChain wrappers for Cohere: ChatCohere for chat and streaming, CohereEmbeddings for vectorization, CohereRerank for candidate re-ranking and compression, and CohereRagRetriever for RAG retrieval patterns. It demonstrates how to compose these components into chains, bind tools to the LLM, and produce structured outputs via Pydantic models. Examples include vector store integration, contextual compression retrievers, streaming invocation, and a full RAG chain example wired end-to-end.

When to use it

Building retrieval-augmented generation pipelines that combine embeddings + vector store + LLM
Creating agents that call external tools with reliable tool-calling behavior
Streaming outputs to UIs or progressive consumers
Producing structured, type-safe outputs (Pydantic) from LLM responses
Improving answer relevance by reranking candidate documents with CohereRerank

Best practices

Set temperature low (e.g., 0.3) for predictable tool calling and factual outputs
Use embed-english-v3.0 (or newer) for document and query embeddings to maximize vector search quality
Combine FAISS (or another vector store) with CohereEmbeddings and a reranker for high-precision retrieval
Use ContextualCompressionRetriever with CohereRerank to reduce prompt size while retaining relevance
Leverage streaming for large outputs to improve UX and to handle long generations incrementally
Wrap outputs with Pydantic models for deterministic structured parsing and downstream processing

Example use cases

Customer support assistant that fetches relevant KB articles, reranks them, and answers with citations
Agent that calls external tools (APIs) like weather or booking systems with predictable tool-call parsing
RAG-powered search UI: embed documents, run vector search, compress with reranker, and answer user queries
Content summarization pipeline that retrieves related docs, reranks, and produces structured summaries
Internal knowledge assistant that returns typed entities (e.g., policy, date, owner) using structured output models

FAQ

Are all Cohere models supported via LangChain wrappers?

Not all. Some special models like Command A Reasoning and Command A Vision are not supported in LangChain; use the native Cohere SDK for those models.

What settings improve tool-calling reliability?

Lower temperature (around 0.3), explicit prompt templates, and providing tool definitions to the LLM help. Bind tools via the LangChain tool decorator and inspect response.tool_calls for execution.