home / skills / yoanbernabeu / grepai-skills / grepai-ollama-setup
This skill guides you through installing and configuring Ollama for private local embeddings with GrepAI, ensuring offline code search.
npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-ollama-setupReview the files below or copy the command above to add this skill to your agents.
---
name: grepai-ollama-setup
description: Install and configure Ollama for local embeddings with GrepAI. Use this skill when setting up private, local embedding generation.
---
# Ollama Setup for GrepAI
This skill covers installing and configuring Ollama as the local embedding provider for GrepAI. Ollama enables 100% private code search where your code never leaves your machine.
## When to Use This Skill
- Setting up GrepAI with local, private embeddings
- Installing Ollama for the first time
- Choosing and downloading embedding models
- Troubleshooting Ollama connection issues
## Why Ollama?
| Benefit | Description |
|---------|-------------|
| π **Privacy** | Code never leaves your machine |
| π° **Free** | No API costs |
| β‘ **Fast** | Local processing, no network latency |
| π **Offline** | Works without internet |
## Installation
### macOS (Homebrew)
```bash
# Install Ollama
brew install ollama
# Start the Ollama service
ollama serve
```
### macOS (Direct Download)
1. Download from [ollama.com](https://ollama.com)
2. Open the `.dmg` and drag to Applications
3. Launch Ollama from Applications
### Linux
```bash
# One-line installer
curl -fsSL https://ollama.com/install.sh | sh
# Start the service
ollama serve
```
### Windows
1. Download installer from [ollama.com](https://ollama.com/download/windows)
2. Run the installer
3. Ollama starts automatically as a service
## Downloading Embedding Models
GrepAI requires an embedding model to convert code into vectors.
### Recommended Model: nomic-embed-text
```bash
# Download the recommended model (768 dimensions)
ollama pull nomic-embed-text
```
**Specifications:**
- Dimensions: 768
- Size: ~274 MB
- Performance: Excellent for code search
- Language: English-optimized
### Alternative Models
```bash
# Multilingual support (better for non-English code/comments)
ollama pull nomic-embed-text-v2-moe
# Larger, more accurate
ollama pull bge-m3
# Maximum quality
ollama pull mxbai-embed-large
```
| Model | Dimensions | Size | Best For |
|-------|------------|------|----------|
| `nomic-embed-text` | 768 | 274 MB | General code search |
| `nomic-embed-text-v2-moe` | 768 | 500 MB | Multilingual codebases |
| `bge-m3` | 1024 | 1.2 GB | Large codebases |
| `mxbai-embed-large` | 1024 | 670 MB | Maximum accuracy |
## Verifying Installation
### Check Ollama is Running
```bash
# Check if Ollama server is responding
curl http://localhost:11434/api/tags
# Expected output: JSON with available models
```
### List Downloaded Models
```bash
ollama list
# Output:
# NAME ID SIZE MODIFIED
# nomic-embed-text:latest abc123... 274 MB 2 hours ago
```
### Test Embedding Generation
```bash
# Quick test (should return embedding vector)
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "function hello() { return world; }"
}'
```
## Configuring GrepAI for Ollama
After installing Ollama, configure GrepAI to use it:
```yaml
# .grepai/config.yaml
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
```
This is the **default configuration** when you run `grepai init`, so no changes are needed if using `nomic-embed-text`.
## Running Ollama
### Foreground (Development)
```bash
# Run in current terminal (see logs)
ollama serve
```
### Background (macOS/Linux)
```bash
# Using nohup
nohup ollama serve &
# Or as a systemd service (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama
```
### Check Status
```bash
# Check if running
pgrep -f ollama
# Or test the API
curl -s http://localhost:11434/api/tags | head -1
```
## Resource Considerations
### Memory Usage
Embedding models load into RAM:
- `nomic-embed-text`: ~500 MB RAM
- `bge-m3`: ~1.5 GB RAM
- `mxbai-embed-large`: ~1 GB RAM
### CPU vs GPU
Ollama uses CPU by default. For faster embeddings:
- **macOS:** Uses Metal (Apple Silicon) automatically
- **Linux/Windows:** Install CUDA for NVIDIA GPU support
## Common Issues
β **Problem:** `connection refused` to localhost:11434
β
**Solution:** Start Ollama:
```bash
ollama serve
```
β **Problem:** Model not found
β
**Solution:** Pull the model first:
```bash
ollama pull nomic-embed-text
```
β **Problem:** Slow embedding generation
β
**Solution:**
- Use a smaller model
- Ensure Ollama is using GPU (check `ollama ps`)
- Close other memory-intensive applications
β **Problem:** Out of memory
β
**Solution:** Use a smaller model or increase system RAM
## Best Practices
1. **Start Ollama before GrepAI:** Ensure `ollama serve` is running
2. **Use recommended model:** `nomic-embed-text` offers best balance
3. **Keep Ollama running:** Leave it as a background service
4. **Update periodically:** `ollama pull nomic-embed-text` for updates
## Output Format
After successful setup:
```
β
Ollama Setup Complete
Ollama Version: 0.1.x
Endpoint: http://localhost:11434
Model: nomic-embed-text (768 dimensions)
Status: Running
GrepAI is ready to use with local embeddings.
Your code will never leave your machine.
```
This skill installs and configures Ollama as the local embedding provider for GrepAI so you can generate private embeddings on your machine. It guides you through installation on macOS, Linux, and Windows, model downloads, verification steps, and GrepAI configuration. The goal is a fast, private setup where code never leaves your device.
The skill walks you through installing the Ollama service, pulling an embedding model (recommended: nomic-embed-text), verifying the local API is responding, and updating GrepAIβs config to point at the local endpoint. It includes commands to test embedding generation and common troubleshooting steps for connection, model availability, and resource limits.
How do I verify Ollama is running?
Curl the API tags endpoint (curl http://localhost:11434/api/tags) or run ollama list to see downloaded models.
Which model should I pull first?
Start with nomic-embed-text (768 dims) for general code search; pull larger or multilingual models only if you need higher accuracy or language coverage.