home / skills / yoanbernabeu / grepai-skills / grepai-storage-postgres
This skill helps teams configure PostgreSQL with pgvector for GrepAI, enabling scalable, concurrent, and persistent semantic code search across large codebases.
npx playbooks add skill yoanbernabeu/grepai-skills --skill grepai-storage-postgresReview the files below or copy the command above to add this skill to your agents.
---
name: grepai-storage-postgres
description: Configure PostgreSQL with pgvector for GrepAI. Use this skill for team environments and large codebases.
---
# GrepAI Storage with PostgreSQL
This skill covers using PostgreSQL with the pgvector extension as the storage backend for GrepAI.
## When to Use This Skill
- Team environments with shared index
- Large codebases (10K+ files)
- Need concurrent access
- Integration with existing PostgreSQL infrastructure
## Prerequisites
1. PostgreSQL 14+ with pgvector extension
2. Database user with create table permissions
3. Network access to PostgreSQL server
## Advantages
| Benefit | Description |
|---------|-------------|
| 👥 **Team sharing** | Multiple users can access same index |
| 📏 **Scalable** | Handles large codebases |
| 🔄 **Concurrent** | Multiple simultaneous searches |
| 💾 **Persistent** | Data survives machine restarts |
| 🔧 **Familiar** | Standard database tooling |
## Setting Up PostgreSQL with pgvector
### Option 1: Docker (Recommended for Development)
```bash
# Run PostgreSQL with pgvector
docker run -d \
--name grepai-postgres \
-e POSTGRES_USER=grepai \
-e POSTGRES_PASSWORD=grepai \
-e POSTGRES_DB=grepai \
-p 5432:5432 \
pgvector/pgvector:pg16
```
### Option 2: Install on Existing PostgreSQL
```bash
# Install pgvector extension (Ubuntu/Debian)
sudo apt install postgresql-16-pgvector
# Or compile from source
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
```
Then enable the extension:
```sql
-- Connect to your database
CREATE EXTENSION IF NOT EXISTS vector;
```
### Option 3: Managed Services
- **Supabase:** pgvector included by default
- **Neon:** pgvector available
- **AWS RDS:** Install pgvector extension
- **Azure Database:** pgvector available
## Configuration
### Basic Configuration
```yaml
# .grepai/config.yaml
store:
backend: postgres
postgres:
dsn: postgres://user:password@localhost:5432/grepai
```
### With Environment Variable
```yaml
store:
backend: postgres
postgres:
dsn: ${DATABASE_URL}
```
Set the environment variable:
```bash
export DATABASE_URL="postgres://user:password@localhost:5432/grepai"
```
### Full DSN Options
```yaml
store:
backend: postgres
postgres:
dsn: postgres://user:password@host:5432/database?sslmode=require
```
DSN components:
- `user`: Database username
- `password`: Database password
- `host`: Server hostname or IP
- `5432`: Port (default: 5432)
- `database`: Database name
- `sslmode`: SSL mode (disable, require, verify-full)
## SSL Modes
| Mode | Description | Use Case |
|------|-------------|----------|
| `disable` | No SSL | Local development |
| `require` | SSL required | Production |
| `verify-full` | SSL + verify certificate | High security |
```yaml
# Production with SSL
store:
backend: postgres
postgres:
dsn: postgres://user:[email protected]:5432/grepai?sslmode=require
```
## Database Schema
GrepAI automatically creates these tables:
```sql
-- Vector embeddings table
CREATE TABLE IF NOT EXISTS embeddings (
id SERIAL PRIMARY KEY,
file_path TEXT NOT NULL,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
start_line INTEGER,
end_line INTEGER,
embedding vector(768), -- Dimension matches your model
created_at TIMESTAMP DEFAULT NOW(),
UNIQUE(file_path, chunk_index)
);
-- Index for vector similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops);
```
## Verifying Setup
### Check pgvector Extension
```sql
-- Connect to database
psql -U grepai -d grepai
-- Check extension is installed
SELECT * FROM pg_extension WHERE extname = 'vector';
-- Check GrepAI tables exist (after first grepai watch)
\dt
```
### Test Connection from GrepAI
```bash
# Check status
grepai status
# Should show PostgreSQL backend info
```
## Performance Tuning
### PostgreSQL Configuration
For better vector search performance:
```sql
-- Increase work memory for vector operations
SET work_mem = '256MB';
-- Adjust for your hardware
SET effective_cache_size = '4GB';
SET shared_buffers = '1GB';
```
### Index Tuning
For large indices, tune the IVFFlat index:
```sql
-- More lists = faster search, more memory
CREATE INDEX ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- Adjust based on row count
```
Rule of thumb: `lists = sqrt(rows)`
## Concurrent Access
PostgreSQL handles concurrent access automatically:
- Multiple `grepai search` commands work simultaneously
- One `grepai watch` daemon per codebase
- Many users can share the same index
## Team Setup
### Shared Database
All team members point to the same database:
```yaml
# Each developer's .grepai/config.yaml
store:
backend: postgres
postgres:
dsn: postgres://team:[email protected]:5432/grepai
```
### Per-Project Databases
For isolated projects, use separate databases:
```bash
# Create databases
createdb -U postgres grepai_projecta
createdb -U postgres grepai_projectb
```
```yaml
# Project A config
store:
backend: postgres
postgres:
dsn: postgres://user:pass@localhost:5432/grepai_projecta
```
## Backup and Restore
### Backup
```bash
pg_dump -U grepai -d grepai > grepai_backup.sql
```
### Restore
```bash
psql -U grepai -d grepai < grepai_backup.sql
```
## Migrating from GOB
1. Set up PostgreSQL with pgvector
2. Update configuration:
```yaml
store:
backend: postgres
postgres:
dsn: postgres://user:pass@localhost:5432/grepai
```
3. Delete old index:
```bash
rm .grepai/index.gob
```
4. Re-index:
```bash
grepai watch
```
## Common Issues
❌ **Problem:** `FATAL: password authentication failed`
✅ **Solution:** Check DSN credentials and pg_hba.conf
❌ **Problem:** `ERROR: extension "vector" is not available`
✅ **Solution:** Install pgvector:
```bash
sudo apt install postgresql-16-pgvector
# Then: CREATE EXTENSION vector;
```
❌ **Problem:** `ERROR: type "vector" does not exist`
✅ **Solution:** Enable extension in the database:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
```
❌ **Problem:** Connection refused
✅ **Solution:**
- Check PostgreSQL is running
- Verify host and port
- Check firewall rules
❌ **Problem:** Slow searches
✅ **Solution:**
- Add IVFFlat index
- Increase `work_mem`
- Vacuum and analyze tables
## Best Practices
1. **Use environment variables:** Don't commit credentials
2. **Enable SSL:** For remote databases
3. **Regular backups:** pg_dump before major changes
4. **Monitor performance:** Check query times
5. **Index maintenance:** Regular VACUUM ANALYZE
## Output Format
PostgreSQL storage status:
```
✅ PostgreSQL Storage Configured
Backend: PostgreSQL + pgvector
Host: localhost:5432
Database: grepai
SSL: disabled
Contents:
- Files: 2,450
- Chunks: 12,340
- Vector dimension: 768
Performance:
- Connection: OK
- IVFFlat index: Yes
- Search latency: ~50ms
```
This skill configures PostgreSQL with the pgvector extension to store GrepAI embeddings and code chunks. It enables a shared, scalable, and persistent vector index suitable for team environments and large codebases. Use it to move from local file-based storage to a production-ready database backend.
The skill prepares PostgreSQL (14+) with pgvector, creates the embeddings table and a vector index (IVFFlat) and updates GrepAI configuration to point at a DSN. GrepAI writes chunked file content and 768‑dim embeddings into the database, and the IVFFlat index accelerates similarity searches. It also includes guidance for Docker, managed services, SSL, tuning, backups, and migration from local indexes.
What PostgreSQL version and extension do I need?
Use PostgreSQL 14 or newer and install the pgvector extension. Then run CREATE EXTENSION IF NOT EXISTS vector; in your database.
How do I test the connection from GrepAI?
Set DATABASE_URL or the DSN in .grepai/config.yaml, run grepai status, and confirm the backend shows PostgreSQL details. Also verify the embeddings table with \dt in psql.
How can I improve slow searches?
Create or tune the IVFFlat index (lists ~ sqrt(rows)), increase work_mem, and ensure regular VACUUM ANALYZE. More lists speeds search at the cost of memory.