home / skills / yoanbernabeu / grepai-skills / grepai-storage-postgres

grepai-storage-postgres skill

safe

This skill helps teams configure PostgreSQL with pgvector for GrepAI, enabling scalable, concurrent, and persistent semantic code search across large codebases.

npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-storage-postgres

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

6.8 KB

---
name: grepai-storage-postgres
description: Configure PostgreSQL with pgvector for GrepAI. Use this skill for team environments and large codebases.
---

# GrepAI Storage with PostgreSQL

This skill covers using PostgreSQL with the pgvector extension as the storage backend for GrepAI.

## When to Use This Skill

- Team environments with shared index
- Large codebases (10K+ files)
- Need concurrent access
- Integration with existing PostgreSQL infrastructure

## Prerequisites

1. PostgreSQL 14+ with pgvector extension
2. Database user with create table permissions
3. Network access to PostgreSQL server

## Advantages

| Benefit | Description |
|---------|-------------|
| 👥 **Team sharing** | Multiple users can access same index |
| 📏 **Scalable** | Handles large codebases |
| 🔄 **Concurrent** | Multiple simultaneous searches |
| 💾 **Persistent** | Data survives machine restarts |
| 🔧 **Familiar** | Standard database tooling |

## Setting Up PostgreSQL with pgvector

### Option 1: Docker (Recommended for Development)

```bash
# Run PostgreSQL with pgvector
docker run -d \
  --name grepai-postgres \
  -e POSTGRES_USER=grepai \
  -e POSTGRES_PASSWORD=grepai \
  -e POSTGRES_DB=grepai \
  -p 5432:5432 \
  pgvector/pgvector:pg16
```

### Option 2: Install on Existing PostgreSQL

```bash
# Install pgvector extension (Ubuntu/Debian)
sudo apt install postgresql-16-pgvector

# Or compile from source
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
```

Then enable the extension:

```sql
-- Connect to your database
CREATE EXTENSION IF NOT EXISTS vector;
```

### Option 3: Managed Services

- **Supabase:** pgvector included by default
- **Neon:** pgvector available
- **AWS RDS:** Install pgvector extension
- **Azure Database:** pgvector available

## Configuration

### Basic Configuration

```yaml
# .grepai/config.yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:password@localhost:5432/grepai
```

### With Environment Variable

```yaml
store:
  backend: postgres
  postgres:
    dsn: ${DATABASE_URL}
```

Set the environment variable:
```bash
export DATABASE_URL="postgres://user:password@localhost:5432/grepai"
```

### Full DSN Options

```yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:password@host:5432/database?sslmode=require
```

DSN components:
- `user`: Database username
- `password`: Database password
- `host`: Server hostname or IP
- `5432`: Port (default: 5432)
- `database`: Database name
- `sslmode`: SSL mode (disable, require, verify-full)

## SSL Modes

| Mode | Description | Use Case |
|------|-------------|----------|
| `disable` | No SSL | Local development |
| `require` | SSL required | Production |
| `verify-full` | SSL + verify certificate | High security |

```yaml
# Production with SSL
store:
  backend: postgres
  postgres:
    dsn: postgres://user:[email protected]:5432/grepai?sslmode=require
```

## Database Schema

GrepAI automatically creates these tables:

```sql
-- Vector embeddings table
CREATE TABLE IF NOT EXISTS embeddings (
    id SERIAL PRIMARY KEY,
    file_path TEXT NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    start_line INTEGER,
    end_line INTEGER,
    embedding vector(768),  -- Dimension matches your model
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(file_path, chunk_index)
);

-- Index for vector similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops);
```

## Verifying Setup

### Check pgvector Extension

```sql
-- Connect to database
psql -U grepai -d grepai

-- Check extension is installed
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Check GrepAI tables exist (after first grepai watch)
\dt
```

### Test Connection from GrepAI

```bash
# Check status
grepai status

# Should show PostgreSQL backend info
```

## Performance Tuning

### PostgreSQL Configuration

For better vector search performance:

```sql
-- Increase work memory for vector operations
SET work_mem = '256MB';

-- Adjust for your hardware
SET effective_cache_size = '4GB';
SET shared_buffers = '1GB';
```

### Index Tuning

For large indices, tune the IVFFlat index:

```sql
-- More lists = faster search, more memory
CREATE INDEX ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- Adjust based on row count
```

Rule of thumb: `lists = sqrt(rows)`

## Concurrent Access

PostgreSQL handles concurrent access automatically:

- Multiple `grepai search` commands work simultaneously
- One `grepai watch` daemon per codebase
- Many users can share the same index

## Team Setup

### Shared Database

All team members point to the same database:

```yaml
# Each developer's .grepai/config.yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://team:[email protected]:5432/grepai
```

### Per-Project Databases

For isolated projects, use separate databases:

```bash
# Create databases
createdb -U postgres grepai_projecta
createdb -U postgres grepai_projectb
```

```yaml
# Project A config
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai_projecta
```

## Backup and Restore

### Backup

```bash
pg_dump -U grepai -d grepai > grepai_backup.sql
```

### Restore

```bash
psql -U grepai -d grepai < grepai_backup.sql
```

## Migrating from GOB

1. Set up PostgreSQL with pgvector
2. Update configuration:
```yaml
store:
  backend: postgres
  postgres:
    dsn: postgres://user:pass@localhost:5432/grepai
```
3. Delete old index:
```bash
rm .grepai/index.gob
```
4. Re-index:
```bash
grepai watch
```

## Common Issues

❌ **Problem:** `FATAL: password authentication failed`
✅ **Solution:** Check DSN credentials and pg_hba.conf

❌ **Problem:** `ERROR: extension "vector" is not available`
✅ **Solution:** Install pgvector:
```bash
sudo apt install postgresql-16-pgvector
# Then: CREATE EXTENSION vector;
```

❌ **Problem:** `ERROR: type "vector" does not exist`
✅ **Solution:** Enable extension in the database:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
```

❌ **Problem:** Connection refused
✅ **Solution:**
- Check PostgreSQL is running
- Verify host and port
- Check firewall rules

❌ **Problem:** Slow searches
✅ **Solution:**
- Add IVFFlat index
- Increase `work_mem`
- Vacuum and analyze tables

## Best Practices

1. **Use environment variables:** Don't commit credentials
2. **Enable SSL:** For remote databases
3. **Regular backups:** pg_dump before major changes
4. **Monitor performance:** Check query times
5. **Index maintenance:** Regular VACUUM ANALYZE

## Output Format

PostgreSQL storage status:

```
✅ PostgreSQL Storage Configured

   Backend: PostgreSQL + pgvector
   Host: localhost:5432
   Database: grepai
   SSL: disabled

   Contents:
   - Files: 2,450
   - Chunks: 12,340
   - Vector dimension: 768

   Performance:
   - Connection: OK
   - IVFFlat index: Yes
   - Search latency: ~50ms
```

Overview

This skill configures PostgreSQL with the pgvector extension to store GrepAI embeddings and code chunks. It enables a shared, scalable, and persistent vector index suitable for team environments and large codebases. Use it to move from local file-based storage to a production-ready database backend.

How this skill works

The skill prepares PostgreSQL (14+) with pgvector, creates the embeddings table and a vector index (IVFFlat) and updates GrepAI configuration to point at a DSN. GrepAI writes chunked file content and 768‑dim embeddings into the database, and the IVFFlat index accelerates similarity searches. It also includes guidance for Docker, managed services, SSL, tuning, backups, and migration from local indexes.

When to use it

Team environments where multiple developers share the same index
Large codebases (10k+ files) that need scalable storage
When you require concurrent searches and persistent storage
To integrate GrepAI with existing PostgreSQL infrastructure
When you need managed service support (Supabase, Neon, RDS)

Best practices

Run PostgreSQL with pgvector 14+ and enable the vector extension in each database
Store DSN in environment variables and avoid committing credentials
Enable SSL (require or verify-full) for remote databases
Tune IVFFlat lists and PostgreSQL memory settings (work_mem/shared_buffers) for large indexes
Schedule regular VACUUM ANALYZE and pg_dump backups

Example use cases

Shared company code search: all engineers point GrepAI to a central DB for unified semantic search
Monorepo support: index millions of lines across many projects with efficient vector search
CI integration: use the DB-backed index in automated code analysis and batch pipelines
Migration from local storage: delete .grepai/index.gob, update DSN, and re-run grepai watch to re-index

FAQ

What PostgreSQL version and extension do I need?

Use PostgreSQL 14 or newer and install the pgvector extension. Then run CREATE EXTENSION IF NOT EXISTS vector; in your database.

How do I test the connection from GrepAI?

Set DATABASE_URL or the DSN in .grepai/config.yaml, run grepai status, and confirm the backend shows PostgreSQL details. Also verify the embeddings table with \dt in psql.

How can I improve slow searches?

Create or tune the IVFFlat index (lists ~ sqrt(rows)), increase work_mem, and ensure regular VACUUM ANALYZE. More lists speeds search at the cost of memory.