home / skills / openclaw / skills / ollama-local

ollama-local skill

needs review

This skill helps you manage and use local Ollama models for inference, embeddings, and tool use with quick, reliable commands.

npx playbooks add skill openclaw/skills --skill ollama-local

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

3.7 KB

---
name: ollama-local
description: Manage and use local Ollama models. Use for model management (list/pull/remove), chat/completions, embeddings, and tool-use with local LLMs. Covers OpenClaw sub-agent integration and model selection guidance.
---

# Ollama Local

Work with local Ollama models for inference, embeddings, and tool use.

## Configuration

Set your Ollama host (defaults to `http://localhost:11434`):

```bash
export OLLAMA_HOST="http://localhost:11434"
# Or for remote server:
export OLLAMA_HOST="http://192.168.1.100:11434"
```

## Quick Reference

```bash
# List models
python3 scripts/ollama.py list

# Pull a model
python3 scripts/ollama.py pull llama3.1:8b

# Remove a model
python3 scripts/ollama.py rm modelname

# Show model details
python3 scripts/ollama.py show qwen3:4b

# Chat with a model
python3 scripts/ollama.py chat qwen3:4b "What is the capital of France?"

# Chat with system prompt
python3 scripts/ollama.py chat llama3.1:8b "Review this code" -s "You are a code reviewer"

# Generate completion (non-chat)
python3 scripts/ollama.py generate qwen3:4b "Once upon a time"

# Get embeddings
python3 scripts/ollama.py embed bge-m3 "Text to embed"
```

## Model Selection

See [references/models.md](references/models.md) for full model list and selection guide.

**Quick picks:**
- Fast answers: `qwen3:4b`
- Coding: `qwen2.5-coder:7b`
- General: `llama3.1:8b`
- Reasoning: `deepseek-r1:8b`

## Tool Use

Some local models support function calling. Use `ollama_tools.py`:

```bash
# Single request with tools
python3 scripts/ollama_tools.py single qwen2.5-coder:7b "What's the weather in Amsterdam?"

# Full tool loop (model calls tools, gets results, responds)
python3 scripts/ollama_tools.py loop qwen3:4b "Search for Python tutorials and summarize"

# Show available example tools
python3 scripts/ollama_tools.py tools
```

**Tool-capable models:** qwen2.5-coder, qwen3, llama3.1, mistral

## OpenClaw Sub-Agents

Spawn local model sub-agents with `sessions_spawn`:

```python
# Example: spawn a coding agent
sessions_spawn(
    task="Review this Python code for bugs",
    model="ollama/qwen2.5-coder:7b",
    label="code-review"
)
```

Model path format: `ollama/<model-name>`

### Parallel Agents (Think Tank Pattern)

Spawn multiple local agents for collaborative tasks:

```python
agents = [
    {"label": "architect", "model": "ollama/gemma3:12b", "task": "Design the system architecture"},
    {"label": "coder", "model": "ollama/qwen2.5-coder:7b", "task": "Implement the core logic"},
    {"label": "reviewer", "model": "ollama/llama3.1:8b", "task": "Review for bugs and improvements"},
]

for a in agents:
    sessions_spawn(task=a["task"], model=a["model"], label=a["label"])
```

## Direct API

For custom integrations, use the Ollama API directly:

```bash
# Chat
curl $OLLAMA_HOST/api/chat -d '{
  "model": "qwen3:4b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'

# Generate
curl $OLLAMA_HOST/api/generate -d '{
  "model": "qwen3:4b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# List models
curl $OLLAMA_HOST/api/tags

# Pull model
curl $OLLAMA_HOST/api/pull -d '{"name": "phi3:mini"}'
```

## Troubleshooting

**Connection refused?**
- Check Ollama is running: `ollama serve`
- Verify OLLAMA_HOST is correct
- For remote servers, ensure firewall allows port 11434

**Model not loading?**
- Check VRAM: larger models may need CPU offload
- Try a smaller model first

**Slow responses?**
- Model may be running on CPU
- Use smaller quantization (e.g., `:7b` instead of `:30b`)

**OpenClaw sub-agent falls back to default model?**
- Ensure `ollama:default` auth profile exists in OpenClaw config
- Check model path format: `ollama/modelname:tag`

Overview

This skill manages and uses local Ollama models for inference, embeddings, and integrated tool use. It provides CLI and script helpers for listing, pulling, removing, showing models, chatting, generating completions, and producing embeddings. It also supports OpenClaw sub-agent spawning and multi-agent patterns for collaborative tasks. Use it to run local LLM workflows without cloud dependencies.

How this skill works

The skill talks to an Ollama daemon (default http://localhost:11434) via environment-configured host or direct API calls. It wraps common operations—list, pull, remove, show models—and exposes chat, generate, and embed commands plus a tool loop for function-calling models. For multi-agent or persistent sessions, it spawns OpenClaw sub-agents using model paths like ollama/<model-name> and coordinates tool-enabled interactions.

When to use it

You need local inference or embeddings without relying on cloud APIs.
Managing model lifecycle (list, pull, remove, inspect) on a local Ollama host.
Running tool-enabled conversations or function-calling workflows locally.
Spawning sub-agents for parallel or collaborative tasks (think tank pattern).
Integrating Ollama models into custom scripts or CI pipelines via direct API calls.

Best practices

Set OLLAMA_HOST environment variable to point to your Ollama server before running commands.
Start with smaller models (7b, 4b) for development; move to larger models only after verifying VRAM and performance.
Use the tool loop for workflows that require model-driven tool invocation and iterative calls.
Verify firewall and port 11434 access for remote Ollama hosts and ensure ollama serve is running.
Use explicit model tags (e.g., qwen3:4b) and the ollama/<model-name> path when spawning OpenClaw sub-agents.

Example use cases

Pull and run a local 8b model for offline chat and code review tasks.
Generate embeddings from documents for a private semantic search index.
Spawn three specialized sub-agents (architect, coder, reviewer) to collaboratively design and implement a feature.
Use the tool-enabled loop to let a model search, fetch results, and synthesize a summary.
Integrate Ollama chat or generate endpoints into a local web app via simple curl or script calls.

FAQ

How do I change the Ollama host?

Set the OLLAMA_HOST environment variable to the server URL (for example export OLLAMA_HOST="http://192.168.1.100:11434").

Which models support tools/function calling?

Tool-capable models include qwen2.5-coder, qwen3, llama3.1, and mistral; check model docs for specifics.

What if a model fails to load or is very slow?

Confirm available VRAM, try a smaller quantized model, or enable CPU offload; large models may require more resources.