home / skills / yoanbernabeu / grepai-skills / grepai-embeddings-ollama

grepai-embeddings-ollama skill

/skills/embeddings/grepai-embeddings-ollama

This skill configures Ollama as a private embedding provider for GrepAI, enabling fast, local code search with offline privacy.

npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-embeddings-ollama

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

6.4 KB

---
name: grepai-embeddings-ollama
description: Configure Ollama as embedding provider for GrepAI. Use this skill for local, private embedding generation.
---

# GrepAI Embeddings with Ollama

This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search.

## When to Use This Skill

- Setting up private, local embeddings
- Choosing the right Ollama model
- Optimizing Ollama performance
- Troubleshooting Ollama connection issues

## Why Ollama?

| Advantage | Description |
|-----------|-------------|
| 🔒 **Privacy** | Code never leaves your machine |
| 💰 **Free** | No API costs or usage limits |
| ⚡ **Speed** | No network latency |
| 🔌 **Offline** | Works without internet |
| 🔧 **Control** | Choose your model |

## Prerequisites

1. Ollama installed and running
2. An embedding model downloaded

```bash
# Install Ollama
brew install ollama  # macOS
# or
curl -fsSL https://ollama.com/install.sh | sh  # Linux

# Start Ollama
ollama serve

# Download model
ollama pull nomic-embed-text
```

## Configuration

### Basic Configuration

```yaml
# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
```

### With Custom Endpoint

```yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://192.168.1.100:11434  # Remote Ollama server
```

### With Explicit Dimensions

```yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
  dimensions: 768  # Usually auto-detected
```

## Available Models

### Recommended: nomic-embed-text

```bash
ollama pull nomic-embed-text
```

| Property | Value |
|----------|-------|
| Dimensions | 768 |
| Size | ~274 MB |
| Speed | Fast |
| Quality | Excellent for code |
| Language | English-optimized |

**Configuration:**
```yaml
embedder:
  provider: ollama
  model: nomic-embed-text
```

### Multilingual: nomic-embed-text-v2-moe

```bash
ollama pull nomic-embed-text-v2-moe
```

| Property | Value |
|----------|-------|
| Dimensions | 768 |
| Size | ~500 MB |
| Speed | Medium |
| Quality | Excellent |
| Language | Multilingual |

Best for codebases with non-English comments/documentation.

**Configuration:**
```yaml
embedder:
  provider: ollama
  model: nomic-embed-text-v2-moe
```

### High Quality: bge-m3

```bash
ollama pull bge-m3
```

| Property | Value |
|----------|-------|
| Dimensions | 1024 |
| Size | ~1.2 GB |
| Speed | Slower |
| Quality | Very high |
| Language | Multilingual |

Best for large, complex codebases where accuracy is critical.

**Configuration:**
```yaml
embedder:
  provider: ollama
  model: bge-m3
  dimensions: 1024
```

### Maximum Quality: mxbai-embed-large

```bash
ollama pull mxbai-embed-large
```

| Property | Value |
|----------|-------|
| Dimensions | 1024 |
| Size | ~670 MB |
| Speed | Medium |
| Quality | Highest |
| Language | English |

**Configuration:**
```yaml
embedder:
  provider: ollama
  model: mxbai-embed-large
  dimensions: 1024
```

## Model Comparison

| Model | Dims | Size | Speed | Quality | Use Case |
|-------|------|------|-------|---------|----------|
| `nomic-embed-text` | 768 | 274MB | ⚡⚡⚡ | ⭐⭐⭐ | General use |
| `nomic-embed-text-v2-moe` | 768 | 500MB | ⚡⚡ | ⭐⭐⭐⭐ | Multilingual |
| `bge-m3` | 1024 | 1.2GB | ⚡ | ⭐⭐⭐⭐⭐ | Large codebases |
| `mxbai-embed-large` | 1024 | 670MB | ⚡⚡ | ⭐⭐⭐⭐⭐ | Maximum accuracy |

## Performance Optimization

### Memory Management

Models load into RAM. Ensure sufficient memory:

| Model | RAM Required |
|-------|--------------|
| `nomic-embed-text` | ~500 MB |
| `nomic-embed-text-v2-moe` | ~800 MB |
| `bge-m3` | ~1.5 GB |
| `mxbai-embed-large` | ~1 GB |

### GPU Acceleration

Ollama automatically uses:
- **macOS:** Metal (Apple Silicon)
- **Linux/Windows:** CUDA (NVIDIA GPUs)

Check GPU usage:
```bash
ollama ps
```

### Keeping Model Loaded

By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded:

```bash
# Keep model loaded indefinitely
curl http://localhost:11434/api/generate -d '{
  "model": "nomic-embed-text",
  "keep_alive": -1
}'
```

## Verifying Connection

### Check Ollama is Running

```bash
curl http://localhost:11434/api/tags
```

### List Available Models

```bash
ollama list
```

### Test Embedding

```bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function authenticate(user, password)"
}'
```

## Running Ollama as a Service

### macOS (launchd)

Ollama app runs automatically on login.

### Linux (systemd)

```bash
# Enable service
sudo systemctl enable ollama

# Start service
sudo systemctl start ollama

# Check status
sudo systemctl status ollama
```

### Manual Background

```bash
nohup ollama serve > /dev/null 2>&1 &
```

## Remote Ollama Server

Run Ollama on a powerful server and connect remotely:

### On the Server

```bash
# Allow remote connections
OLLAMA_HOST=0.0.0.0 ollama serve
```

### On the Client

```yaml
# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://server-ip:11434
```

## Common Issues

❌ **Problem:** Connection refused
✅ **Solution:**
```bash
# Start Ollama
ollama serve
```

❌ **Problem:** Model not found
✅ **Solution:**
```bash
# Pull the model
ollama pull nomic-embed-text
```

❌ **Problem:** Slow embedding generation
✅ **Solutions:**
- Use a smaller model (`nomic-embed-text`)
- Ensure GPU is being used (`ollama ps`)
- Close memory-intensive applications
- Consider a remote server with better hardware

❌ **Problem:** Out of memory
✅ **Solutions:**
- Use a smaller model
- Close other applications
- Upgrade RAM
- Use remote Ollama server

❌ **Problem:** Embeddings differ after model update
✅ **Solution:** Re-index after model updates:
```bash
rm .grepai/index.gob
grepai watch
```

## Best Practices

1. **Start with `nomic-embed-text`:** Best balance of speed/quality
2. **Keep Ollama running:** Background service recommended
3. **Match dimensions:** Don't mix models with different dimensions
4. **Re-index on model change:** Delete index and re-run watch
5. **Monitor memory:** Embedding models use significant RAM

## Output Format

Successful Ollama configuration:

```
✅ Ollama Embedding Provider Configured

   Provider: Ollama
   Model: nomic-embed-text
   Endpoint: http://localhost:11434
   Dimensions: 768 (auto-detected)
   Status: Connected

   Model Info:
   - Size: 274 MB
   - Loaded: Yes
   - GPU: Apple Metal
```

Overview

This skill configures Ollama as the embedding provider for GrepAI to enable fully local, private embedding generation for semantic code search. It guides choosing models, configuring endpoints, and optimizing performance so code never leaves your machine. Use it to run fast, offline embeddings with control over model selection and resource trade-offs.

How this skill works

The skill instructs GrepAI to call an Ollama server endpoint for embedding requests and shows how to set provider, model, endpoint, and optional dimensions in .grepai/config.yaml. It explains pulling models into Ollama, verifying the server, testing embeddings, and keeping models loaded for reduced latency. It also covers running Ollama as a background service or on a remote host.

When to use it

You need fully private, on-premise embeddings for code search.
You want to avoid API costs and network latency.
You need to choose or tune an embedding model for a specific codebase.
You want to run Ollama locally or on a dedicated server and connect GrepAI to it.
You need to optimize memory, GPU usage, and model persistence.

Best practices

Start with nomic-embed-text for a good balance of speed and quality.
Run Ollama as a background service (systemd, launchd, or nohup) to keep models available.
Match embedding dimensions across models; re-index when switching models.
Use GPU acceleration where available and monitor with ollama ps.
Prefer smaller models on limited RAM or use a remote Ollama server for heavy workloads.

Example use cases

Local semantic code search across private repositories without exposing code to external APIs.
Running GrepAI in an air-gapped environment with offline embedding generation.
Using a high-quality model (bge-m3) for large, complex codebases where accuracy matters.
Running Ollama on a powerful remote server and pointing developer machines to that endpoint.
Automating re-indexing after model updates to keep search results consistent.

FAQ

How do I test my Ollama connection?

Call the tags endpoint (curl http://localhost:11434/api/tags) or request embeddings via the API to confirm the server responds.

What model should I start with?

Begin with nomic-embed-text for most codebases; use multilingual or higher-quality models only when needed.

How do I keep a model loaded to reduce latency?

Send a generate request with keep_alive set to -1 or run Ollama as a persistent service so the model remains in memory.