home / skills / yoanbernabeu / grepai-skills / grepai-embeddings-lmstudio

grepai-embeddings-lmstudio skill

/skills/embeddings/grepai-embeddings-lmstudio

This skill configures LM Studio as the embedding provider for GrepAI, enabling local, GUI-driven embeddings with simple model switching.

npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-embeddings-lmstudio

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.5 KB
---
name: grepai-embeddings-lmstudio
description: Configure LM Studio as embedding provider for GrepAI. Use this skill for local embeddings with a GUI interface.
---

# GrepAI Embeddings with LM Studio

This skill covers using LM Studio as the embedding provider for GrepAI, offering a user-friendly GUI for managing local models.

## When to Use This Skill

- Want local embeddings with a graphical interface
- Already using LM Studio for other AI tasks
- Prefer visual model management over CLI
- Need to easily switch between models

## What is LM Studio?

LM Studio is a desktop application for running local LLMs with:
- šŸ–„ļø Graphical user interface
- šŸ“¦ Easy model downloading
- šŸ”Œ OpenAI-compatible API
- šŸ”’ 100% private, local processing

## Prerequisites

1. Download LM Studio from [lmstudio.ai](https://lmstudio.ai)
2. Install and launch the application
3. Download an embedding model

## Installation

### Step 1: Download LM Studio

Visit [lmstudio.ai](https://lmstudio.ai) and download for your platform:
- macOS (Intel or Apple Silicon)
- Windows
- Linux

### Step 2: Launch and Download a Model

1. Open LM Studio
2. Go to the **Search** tab
3. Search for an embedding model:
   - `nomic-embed-text-v1.5`
   - `bge-small-en-v1.5`
   - `bge-large-en-v1.5`
4. Click **Download**

### Step 3: Start the Local Server

1. Go to the **Local Server** tab
2. Select your embedding model
3. Click **Start Server**
4. Note the endpoint (default: `http://localhost:1234`)

## Configuration

### Basic Configuration

```yaml
# .grepai/config.yaml
embedder:
  provider: lmstudio
  model: nomic-embed-text-v1.5
  endpoint: http://localhost:1234
```

### With Custom Port

```yaml
embedder:
  provider: lmstudio
  model: nomic-embed-text-v1.5
  endpoint: http://localhost:8080
```

### With Explicit Dimensions

```yaml
embedder:
  provider: lmstudio
  model: nomic-embed-text-v1.5
  endpoint: http://localhost:1234
  dimensions: 768
```

## Available Models

### nomic-embed-text-v1.5 (Recommended)

| Property | Value |
|----------|-------|
| Dimensions | 768 |
| Size | ~260 MB |
| Quality | Excellent |
| Speed | Fast |

```yaml
embedder:
  provider: lmstudio
  model: nomic-embed-text-v1.5
```

### bge-small-en-v1.5

| Property | Value |
|----------|-------|
| Dimensions | 384 |
| Size | ~130 MB |
| Quality | Good |
| Speed | Very fast |

Best for: Smaller codebases, faster indexing.

```yaml
embedder:
  provider: lmstudio
  model: bge-small-en-v1.5
  dimensions: 384
```

### bge-large-en-v1.5

| Property | Value |
|----------|-------|
| Dimensions | 1024 |
| Size | ~1.3 GB |
| Quality | Very high |
| Speed | Slower |

Best for: Maximum accuracy.

```yaml
embedder:
  provider: lmstudio
  model: bge-large-en-v1.5
  dimensions: 1024
```

## Model Comparison

| Model | Dims | Size | Speed | Quality |
|-------|------|------|-------|---------|
| `bge-small-en-v1.5` | 384 | 130MB | ⚔⚔⚔ | ⭐⭐⭐ |
| `nomic-embed-text-v1.5` | 768 | 260MB | ⚔⚔ | ⭐⭐⭐⭐ |
| `bge-large-en-v1.5` | 1024 | 1.3GB | ⚔ | ⭐⭐⭐⭐⭐ |

## LM Studio Server Setup

### Starting the Server

1. Open LM Studio
2. Navigate to **Local Server** tab (left sidebar)
3. Select an embedding model from the dropdown
4. Configure settings:
   - Port: `1234` (default)
   - Enable **Embedding Endpoint**
5. Click **Start Server**

### Server Status

Look for the green indicator showing the server is running.

### Verifying the Server

```bash
# Check server is responding
curl http://localhost:1234/v1/models

# Test embedding
curl http://localhost:1234/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text-v1.5",
    "input": "function authenticate(user)"
  }'
```

## LM Studio Settings

### Recommended Settings

In LM Studio's Local Server tab:

| Setting | Recommended Value |
|---------|-------------------|
| Port | 1234 |
| Enable CORS | Yes |
| Context Length | Auto |
| GPU Layers | Max (for speed) |

### GPU Acceleration

LM Studio automatically uses:
- **macOS:** Metal (Apple Silicon)
- **Windows/Linux:** CUDA (NVIDIA)

Adjust GPU layers in settings for memory/speed balance.

## Running LM Studio Headless

For server environments, LM Studio supports CLI mode:

```bash
# Start server without GUI (check LM Studio docs for exact syntax)
lmstudio server start --model nomic-embed-text-v1.5 --port 1234
```

## Common Issues

āŒ **Problem:** Connection refused
āœ… **Solution:** Ensure LM Studio server is running:
1. Open LM Studio
2. Go to Local Server tab
3. Click Start Server

āŒ **Problem:** Model not found
āœ… **Solution:**
1. Download the model in LM Studio's Search tab
2. Select it in the Local Server dropdown

āŒ **Problem:** Slow embedding generation
āœ… **Solutions:**
- Enable GPU acceleration in LM Studio settings
- Use a smaller model (bge-small-en-v1.5)
- Close other GPU-intensive applications

āŒ **Problem:** Port already in use
āœ… **Solution:** Change port in LM Studio settings:
```yaml
embedder:
  endpoint: http://localhost:8080  # Different port
```

āŒ **Problem:** LM Studio closes and server stops
āœ… **Solution:** Keep LM Studio running in the background, or consider using Ollama which runs as a system service

## LM Studio vs Ollama

| Feature | LM Studio | Ollama |
|---------|-----------|--------|
| GUI | āœ… Yes | āŒ CLI only |
| System service | āŒ App must run | āœ… Background service |
| Model management | āœ… Visual | āœ… CLI |
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Server reliability | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |

**Recommendation:** Use LM Studio if you prefer a GUI, Ollama for always-on background service.

## Migrating from LM Studio to Ollama

If you need a more reliable background service:

1. Install Ollama:
```bash
brew install ollama
ollama serve &
ollama pull nomic-embed-text
```

2. Update config:
```yaml
embedder:
  provider: ollama
  model: nomic-embed-text
  endpoint: http://localhost:11434
```

3. Re-index:
```bash
rm .grepai/index.gob
grepai watch
```

## Best Practices

1. **Keep LM Studio running:** Server stops when app closes
2. **Use recommended model:** `nomic-embed-text-v1.5` for best balance
3. **Enable GPU:** Faster embeddings with hardware acceleration
4. **Check server before indexing:** Ensure green status indicator
5. **Consider Ollama for production:** More reliable as background service

## Output Format

Successful LM Studio configuration:

```
āœ… LM Studio Embedding Provider Configured

   Provider: LM Studio
   Model: nomic-embed-text-v1.5
   Endpoint: http://localhost:1234
   Dimensions: 768 (auto-detected)
   Status: Connected

   Note: Keep LM Studio running for embeddings to work.
```

Overview

This skill configures LM Studio as the embedding provider for GrepAI so you can generate local embeddings through a GUI-managed local server. It guides model selection, server setup, and GrepAI configuration for fast, private semantic search and code analysis. Use it to run embeddings locally with easy model management and optional GPU acceleration.

How this skill works

The skill walks you through installing LM Studio, downloading an embedding model, and starting its Local Server to expose an OpenAI-compatible embeddings endpoint (default http://localhost:1234). It provides the GrepAI YAML snippets you need to point the embedder at that endpoint, plus model dimension examples and verification commands. Troubleshooting tips cover common connection, model, and performance issues.

When to use it

  • You want local, private embeddings with a graphical interface for model management.
  • You already use LM Studio for other local LLM tasks and want to reuse models.
  • You need to switch models frequently or try different embedding sizes quickly.
  • You want GPU-accelerated embeddings on your desktop for faster indexing.
  • You prefer a GUI over a headless service for experimenting before production.

Best practices

  • Keep LM Studio running while GrepAI is indexing or serving embeddings.
  • Use nomic-embed-text-v1.5 for best balance of quality and speed; use bge-small for speed or bge-large for maximum accuracy.
  • Enable GPU acceleration and adjust GPU layers for memory/speed tradeoffs.
  • Verify the local server endpoint with a curl request before reindexing.
  • Consider migrating to a system service (like Ollama) for always-on production deployments.

Example use cases

  • Index and semantically search a medium-sized codebase locally without sending data to external APIs.
  • Compare embedding quality and speed by swapping between bge-small, nomic-embed-text, and bge-large models via LM Studio UI.
  • Run experiments on embedding dimensions and re-index quickly using the LM Studio GUI to change models.
  • Use GPU-accelerated embeddings during heavy indexing runs to reduce processing time.
  • Prepare a local proof-of-concept with LM Studio, then migrate to Ollama for a production service.

FAQ

What endpoint should I point GrepAI to?

Point GrepAI to the LM Studio Local Server endpoint, typically http://localhost:1234 (change the port if you configured a different one).

What if the server is not responding?

Open LM Studio, ensure the Local Server is started and shows a green status indicator, enable the embedding endpoint, and confirm the port is free.