home / skills / yoanbernabeu / grepai-skills / grepai-ollama-setup

grepai-ollama-setup skill

/skills/getting-started/grepai-ollama-setup

This skill guides you through installing and configuring Ollama for private local embeddings with GrepAI, ensuring offline code search.

npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-ollama-setup

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.0 KB
---
name: grepai-ollama-setup
description: Install and configure Ollama for local embeddings with GrepAI. Use this skill when setting up private, local embedding generation.
---

# Ollama Setup for GrepAI

This skill covers installing and configuring Ollama as the local embedding provider for GrepAI. Ollama enables 100% private code search where your code never leaves your machine.

## When to Use This Skill

- Setting up GrepAI with local, private embeddings
- Installing Ollama for the first time
- Choosing and downloading embedding models
- Troubleshooting Ollama connection issues

## Why Ollama?

| Benefit | Description |
|---------|-------------|
| πŸ”’ **Privacy** | Code never leaves your machine |
| πŸ’° **Free** | No API costs |
| ⚑ **Fast** | Local processing, no network latency |
| πŸ”Œ **Offline** | Works without internet |

## Installation

### macOS (Homebrew)

```bash
# Install Ollama
brew install ollama

# Start the Ollama service
ollama serve
```

### macOS (Direct Download)

1. Download from [ollama.com](https://ollama.com)
2. Open the `.dmg` and drag to Applications
3. Launch Ollama from Applications

### Linux

```bash
# One-line installer
curl -fsSL https://ollama.com/install.sh | sh

# Start the service
ollama serve
```

### Windows

1. Download installer from [ollama.com](https://ollama.com/download/windows)
2. Run the installer
3. Ollama starts automatically as a service

## Downloading Embedding Models

GrepAI requires an embedding model to convert code into vectors.

### Recommended Model: nomic-embed-text

```bash
# Download the recommended model (768 dimensions)
ollama pull nomic-embed-text
```

**Specifications:**
- Dimensions: 768
- Size: ~274 MB
- Performance: Excellent for code search
- Language: English-optimized

### Alternative Models

```bash
# Multilingual support (better for non-English code/comments)
ollama pull nomic-embed-text-v2-moe

# Larger, more accurate
ollama pull bge-m3

# Maximum quality
ollama pull mxbai-embed-large
```

| Model | Dimensions | Size | Best For |
|-------|------------|------|----------|
| `nomic-embed-text` | 768 | 274 MB | General code search |
| `nomic-embed-text-v2-moe` | 768 | 500 MB | Multilingual codebases |
| `bge-m3` | 1024 | 1.2 GB | Large codebases |
| `mxbai-embed-large` | 1024 | 670 MB | Maximum accuracy |

## Verifying Installation

### Check Ollama is Running

```bash
# Check if Ollama server is responding
curl http://localhost:11434/api/tags

# Expected output: JSON with available models
```

### List Downloaded Models

```bash
ollama list

# Output:
# NAME                     ID           SIZE    MODIFIED
# nomic-embed-text:latest  abc123...    274 MB  2 hours ago
```

### Test Embedding Generation

```bash
# Quick test (should return embedding vector)
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "function hello() { return world; }"
}'
```

## Configuring GrepAI for Ollama

After installing Ollama, configure GrepAI to use it:

```yaml
# .grepai/config.yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
```

This is the **default configuration** when you run `grepai init`, so no changes are needed if using `nomic-embed-text`.

## Running Ollama

### Foreground (Development)

```bash
# Run in current terminal (see logs)
ollama serve
```

### Background (macOS/Linux)

```bash
# Using nohup
nohup ollama serve &

# Or as a systemd service (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama
```

### Check Status

```bash
# Check if running
pgrep -f ollama

# Or test the API
curl -s http://localhost:11434/api/tags | head -1
```

## Resource Considerations

### Memory Usage

Embedding models load into RAM:
- `nomic-embed-text`: ~500 MB RAM
- `bge-m3`: ~1.5 GB RAM
- `mxbai-embed-large`: ~1 GB RAM

### CPU vs GPU

Ollama uses CPU by default. For faster embeddings:
- **macOS:** Uses Metal (Apple Silicon) automatically
- **Linux/Windows:** Install CUDA for NVIDIA GPU support

## Common Issues

❌ **Problem:** `connection refused` to localhost:11434
βœ… **Solution:** Start Ollama:
```bash
ollama serve
```

❌ **Problem:** Model not found
βœ… **Solution:** Pull the model first:
```bash
ollama pull nomic-embed-text
```

❌ **Problem:** Slow embedding generation
βœ… **Solution:**
- Use a smaller model
- Ensure Ollama is using GPU (check `ollama ps`)
- Close other memory-intensive applications

❌ **Problem:** Out of memory
βœ… **Solution:** Use a smaller model or increase system RAM

## Best Practices

1. **Start Ollama before GrepAI:** Ensure `ollama serve` is running
2. **Use recommended model:** `nomic-embed-text` offers best balance
3. **Keep Ollama running:** Leave it as a background service
4. **Update periodically:** `ollama pull nomic-embed-text` for updates

## Output Format

After successful setup:

```
βœ… Ollama Setup Complete

   Ollama Version: 0.1.x
   Endpoint: http://localhost:11434
   Model: nomic-embed-text (768 dimensions)
   Status: Running

   GrepAI is ready to use with local embeddings.
   Your code will never leave your machine.
```

Overview

This skill installs and configures Ollama as the local embedding provider for GrepAI so you can generate private embeddings on your machine. It guides you through installation on macOS, Linux, and Windows, model downloads, verification steps, and GrepAI configuration. The goal is a fast, private setup where code never leaves your device.

How this skill works

The skill walks you through installing the Ollama service, pulling an embedding model (recommended: nomic-embed-text), verifying the local API is responding, and updating GrepAI’s config to point at the local endpoint. It includes commands to test embedding generation and common troubleshooting steps for connection, model availability, and resource limits.

When to use it

  • Setting up GrepAI for private, local embeddings
  • Installing Ollama for the first time on macOS, Linux, or Windows
  • Choosing and downloading an embedding model for code search
  • Troubleshooting Ollama connectivity or model issues
  • Optimizing local embedding performance and memory usage

Best practices

  • Start Ollama before launching GrepAI (run ollama serve or enable the service)
  • Use nomic-embed-text for a balance of size and performance unless you need multilingual or higher-accuracy models
  • Run Ollama in the background as a service for continuous availability
  • Monitor memory usage and pick a smaller model if you hit RAM limits
  • Keep models updated with ollama pull to benefit from fixes and improvements

Example use cases

  • Private code search in a local repo where code must never leave the machine
  • Setting up an offline development environment that generates embeddings without network access
  • Switching to a multilingual embedding model for repositories with mixed-language comments
  • Testing embedding generation locally during CI or pre-commit checks
  • Troubleshooting slow or failing GrepAI searches by validating the Ollama endpoint and model

FAQ

How do I verify Ollama is running?

Curl the API tags endpoint (curl http://localhost:11434/api/tags) or run ollama list to see downloaded models.

Which model should I pull first?

Start with nomic-embed-text (768 dims) for general code search; pull larger or multilingual models only if you need higher accuracy or language coverage.