home / skills / rshvr / unofficial-cohere-best-practices / cohere-python-sdk

cohere-python-sdk skill

/skills/cohere-python-sdk

This skill helps you leverage Cohere's Python SDK for chat, tool use, streaming, and RAG to build Python apps.

npx playbooks add skill rshvr/unofficial-cohere-best-practices --skill cohere-python-sdk

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
7.8 KB
---
name: cohere-python-sdk
description: Cohere Python SDK reference for chat, streaming, tool use, structured outputs, and RAG. Use when building Python applications with Cohere's Command models, embeddings, or reranking APIs.
---

# Cohere Native Python SDK Reference

## Official Resources

- **Docs & Cookbooks**: https://github.com/cohere-ai/cohere-developer-experience
- **API Reference**: https://docs.cohere.com/reference/about

## Table of Contents
1. [Client Setup](#client-setup)
2. [Chat API](#chat-api)
3. [Streaming](#streaming)
4. [Tool Use / Function Calling](#tool-use--function-calling)
5. [Multi-step Tool Use (Agents)](#multi-step-tool-use-agents)
6. [Structured Outputs](#structured-outputs)
7. [RAG with Documents](#rag-with-documents)
8. [Safety Modes](#safety-modes)

## Client Setup

### Basic Setup
```python
import cohere

# Option 1: Auto-read from CO_API_KEY env var
co = cohere.ClientV2()

# Option 2: Explicit API key
co = cohere.ClientV2(api_key="your-api-key")

# Option 3: Custom endpoint (private deployment)
co = cohere.ClientV2(
    api_key="your-api-key",
    base_url="https://your-deployment.com"
)
```

### Async Client
```python
import cohere

async_co = cohere.AsyncClientV2()

async def main():
    response = await async_co.chat(
        model="command-a-03-2025",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.message.content[0].text)
```

## Chat API

### Basic Chat
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ]
)
print(response.message.content[0].text)
```

### With System Message
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python hello world"}
    ]
)
```

### Multi-turn Conversation
```python
messages = [
    {"role": "user", "content": "My name is Veer"},
    {"role": "assistant", "content": "Nice to meet you, Veer!"},
    {"role": "user", "content": "What's my name?"}
]
response = co.chat(model="command-a-03-2025", messages=messages)
```

### Parameters
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Write a story"}],
    temperature=0.7,           # 0.0-1.0, higher = more creative
    max_tokens=500,            # Max response length
    p=0.9,                     # Top-p sampling
    k=50,                      # Top-k sampling
    seed=42,                   # For reproducibility
    stop_sequences=["END"],    # Stop generation at these
)
```

## Reasoning Model (Command A Reasoning)

The `command-a-reasoning-2025` model includes extended thinking capabilities with controllable token budgets:

### Basic Usage
```python
response = co.chat(
    model="command-a-reasoning-2025",
    messages=[{"role": "user", "content": "Solve this step by step: What is 15% of 340?"}],
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Max tokens for internal reasoning
    }
)
print(response.message.content[0].text)
```

### Disable Reasoning (Lower Latency)
```python
response = co.chat(
    model="command-a-reasoning-2025",
    messages=[{"role": "user", "content": "Quick question: capital of France?"}],
    thinking={"type": "disabled"}  # Skip reasoning for simple queries
)
```

## Streaming

### Basic Streaming
```python
response = co.chat_stream(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Write a poem about AI"}]
)

for event in response:
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="", flush=True)
```

### Streaming Event Types
```python
for event in co.chat_stream(model="command-a-03-2025", messages=messages):
    match event.type:
        case "message-start":
            print("Generation started")
        case "content-delta":
            print(event.delta.message.content.text, end="")
        case "message-end":
            print("Generation complete")
        case "tool-plan-delta":
            print(f"Tool plan: {event.delta.message.tool_plan}")
        case "tool-call-start":
            print(f"Tool call started: {event.delta.message.tool_calls}")
```

## Tool Use / Function Calling

### Step 1: Define Tools
```python
def get_weather(location: str) -> dict:
    return {"temperature": "20°C", "condition": "sunny"}

functions_map = {"get_weather": get_weather}

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g., 'Toronto'"
                    }
                },
                "required": ["location"]
            }
        }
    }
]
```

### Step 2: Generate and Execute Tool Calls
```python
import json

messages = [{"role": "user", "content": "What's the weather in Toronto?"}]

response = co.chat(model="command-a-03-2025", messages=messages, tools=tools)

if response.message.tool_calls:
    messages.append({
        "role": "assistant",
        "tool_plan": response.message.tool_plan,
        "tool_calls": response.message.tool_calls
    })

    for tc in response.message.tool_calls:
        args = json.loads(tc.function.arguments)
        result = functions_map[tc.function.name](**args)
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": [{"type": "document", "document": {"data": json.dumps(result)}}]
        })

final_response = co.chat(model="command-a-03-2025", messages=messages, tools=tools)
print(final_response.message.content[0].text)
```

### Controlling Tool Behavior
```python
response = co.chat(
    model="command-a-03-2025",
    messages=messages,
    tools=tools,
    tool_choice="REQUIRED"  # Must call tool. Options: AUTO, REQUIRED, NONE
)
```

## Structured Outputs

### JSON Mode
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "List 3 fruits as JSON"}],
    response_format={"type": "json_object"}
)
```

### JSON Schema
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Extract person info from: John is 30"}],
    response_format={
        "type": "json_object",
        "json_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"}
            },
            "required": ["name", "age"]
        }
    }
)
```

### Strict Tool Parameters
```python
response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "..."}],
    tools=tools,
    strict_tools=True  # Eliminates tool name/param hallucinations
)
```

## RAG with Documents

```python
documents = [
    {"id": "doc1", "data": {"title": "Report", "text": "Q3 revenue was $10M"}},
    {"id": "doc2", "data": {"title": "Summary", "text": "Growth rate: 15%"}}
]

response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "What was Q3 revenue?"}],
    documents=documents
)

for citation in response.message.citations or []:
    print(f"'{citation.text}' cited from {citation.sources}")
```

## Safety Modes

```python
response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "..."}],
    safety_mode="CONTEXTUAL"  # Default, or "STRICT" or "OFF"
)
```

## Error Handling

```python
from cohere.core import ApiError

try:
    response = co.chat(model="command-a-03-2025", messages=messages)
except ApiError as e:
    print(f"API Error: {e.status_code} - {e.body}")
except Exception as e:
    print(f"Error: {e}")
```

Overview

This skill is an unofficial reference and best-practices guide for using Cohere's native Python SDK with Command models, embeddings, reranking, streaming, structured outputs, RAG, and tool-based agents. It focuses on practical examples, client setup, and patterns for robust production usage. Use it to speed up development when building Python applications that call Cohere APIs for chat, tool calling, or retrieval-augmented generation.

How this skill works

The skill documents common SDK flows: initializing sync and async clients, issuing chat requests, enabling streaming, and configuring parameters like temperature, top-p, and max tokens. It explains how to define and execute tools (function calling), integrate multi-step agent patterns, enforce structured JSON outputs or JSON Schema, and pass documents for RAG. Also covered are reasoning controls, safety modes, streaming event handling, and basic error handling patterns.

When to use it

  • Building conversational assistants with Cohere Command models in Python.
  • Implementing tool use or function-calling workflows where the model triggers external code.
  • Streaming large model responses to a UI or CLI with incremental updates.
  • Producing machine-readable outputs using JSON mode or JSON Schema enforcement.
  • Implementing retrieval-augmented generation (RAG) with document citations and embeddings.

Best practices

  • Initialize clients via environment variables for secure key management; use custom base_url for private deployments.
  • Prefer streaming for lower perceived latency and to surface intermediate tool plans or calls.
  • Wrap tool execution loop: detect tool_calls, run local functions, append tool outputs as tool messages, then call chat again.
  • Use response_format with json_schema for strict outputs and strict_tools to avoid parameter hallucinations.
  • Control reasoning (thinking) to balance latency vs. complex multi-step problem solving.
  • Catch ApiError and general exceptions; surface status_code and body for observability.

Example use cases

  • Chatbot that calls external APIs (weather, calendar) via defined tools and returns structured results.
  • Document Q&A system where user queries are answered from ingested documents with citations (RAG).
  • Interactive code assistant that streams incremental code suggestions and uses tool plans for test execution.
  • Data extraction pipeline that enforces JSON Schema to populate structured records from free text.
  • High-throughput embedding and reranking pipeline for semantic search and relevance tuning.

FAQ

How do I force the model to call a tool?

Set tool_choice to "REQUIRED" when calling chat so the model must emit a tool_call rather than answering directly.

How can I ensure machine-readable output?

Use response_format with type "json_object" and optionally provide json_schema; combine with strict_tools to prevent hallucinated params.