home / skills / willsigmon / sigstack / vector-db-expert

This skill helps you choose and implement the right vector database for RAG and semantic search across projects.

npx playbooks add skill willsigmon/sigstack --skill vector-db-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.5 KB
---
name: Vector Database Expert
description: Vector databases - Pinecone, Weaviate, Chroma, Qdrant for RAG and semantic search
allowed-tools: Read, Edit, Bash, WebFetch
model: sonnet
---

# Vector Database Expert

Choose and implement the right vector database for your AI applications.

## Pricing Comparison (2026)

| Database | Free Tier | Paid (1M vectors) |
|----------|-----------|-------------------|
| Pinecone | Yes | ~$41/mo |
| Weaviate | Yes | ~$25-153/mo |
| Chroma | Open source | Self-host cost |
| Qdrant | Open source | Self-host or cloud |

## When to Use Each

### Pinecone
- **Best for**: Production RAG, minimal ops
- **Pros**: Fully managed, fast, reliable
- **Cons**: 3-5x more expensive
- **Use when**: Need SLAs, no DevOps capacity

### Weaviate
- **Best for**: Hybrid search, GraphQL fans
- **Pros**: Flexible pricing, compression options
- **Cons**: More complex setup
- **Use when**: Mid-scale with in-house ops

### Chroma
- **Best for**: Prototypes, learning, embedded use
- **Pros**: Free, simple Python API
- **Cons**: Limited production features
- **Use when**: Starting out, tight budget

### Qdrant
- **Best for**: Performance-critical apps
- **Pros**: Fast, Rust-based, filtering
- **Cons**: Newer ecosystem
- **Use when**: High-performance requirements

## Quick Implementations

### Chroma (Local Dev)
```python
import chromadb
from chromadb.utils import embedding_functions

client = chromadb.Client()
ef = embedding_functions.OpenAIEmbeddingFunction(api_key="...")

collection = client.create_collection(
    name="docs",
    embedding_function=ef
)

collection.add(
    documents=["Swift is great for iOS", "React is for web"],
    ids=["doc1", "doc2"]
)

results = collection.query(
    query_texts=["mobile development"],
    n_results=2
)
```

### Pinecone (Production)
```python
from pinecone import Pinecone

pc = Pinecone(api_key="...")
index = pc.Index("my-index")

# Upsert
index.upsert(vectors=[
    {"id": "doc1", "values": [...], "metadata": {"source": "docs"}}
])

# Query
results = index.query(
    vector=[...],
    top_k=5,
    filter={"source": "docs"}
)
```

### Weaviate
```python
import weaviate

client = weaviate.connect_to_wcs(
    cluster_url="your-url",
    auth_credentials=weaviate.AuthApiKey("key")
)

collection = client.collections.create(
    name="Document",
    vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai()
)
```

## RAG Pattern
```
User Query → Embed → Vector Search → Top K Docs → LLM Context → Response
```

Use when: Building RAG, semantic search, similarity matching, AI memory

Overview

This skill helps you choose and implement the right vector database for retrieval-augmented generation (RAG) and semantic search. It compares Pinecone, Weaviate, Chroma, and Qdrant by trade-offs like cost, ops burden, performance, and production readiness. You get concise guidance and quick code patterns to get started in development and production.

How this skill works

The skill summarizes when each vector store is appropriate and outlines simple implementation snippets for local dev and production. It inspects trade-offs—managed vs self-hosted, pricing, performance, and feature gaps like filtering or graph support—and maps them to concrete use cases. It also shows the standard RAG pipeline so you can integrate embedding, search, and LLM context steps quickly.

When to use it

  • Choose Pinecone when you need a fully managed, SLA-backed vector store and want minimal DevOps overhead.
  • Pick Weaviate for hybrid search scenarios, GraphQL or vector+metadata models, and teams that can handle more complex setup.
  • Use Chroma for local development, prototypes, demos, or embedded applications with minimal cost and friction.
  • Select Qdrant when performance, low-latency filtering, and a Rust-based engine matter for high-throughput needs.
  • Prefer self-hosted (Chroma/Qdrant) when budget or data residency requirements prevent using managed services.

Best practices

  • Design your RAG pipeline: embed inputs, perform vector search, fetch top-K docs, and pass focused context to the LLM to avoid token bloat.
  • Store useful metadata with vectors to enable precise filtering and boosting without re-querying the original documents.
  • Benchmark latency and recall with representative vectors and queries before committing to a provider for production.
  • Start with a lightweight local store (Chroma) for iteration, then migrate vectors and metadata to a managed store (Pinecone/Weaviate) for scale.
  • Monitor costs and vector size: estimate vector count growth and test compression/quantization options to control pricing.

Example use cases

  • Customer support assistant: semantic retrieval of knowledge base articles with metadata filters for product and region.
  • Internal knowledge search: unify docs, tickets, and wikis using embeddings and fast nearest-neighbor search.
  • Product recommendations: similarity matching on item embeddings with attribute filtering for real-time relevance.
  • AI memory store: persist user interactions as vectors and fetch relevant past exchanges for personalized prompts.
  • Prototype conversational agents locally with Chroma, then migrate to Pinecone or Qdrant in production.

FAQ

Which vendor is cheapest for 1M vectors?

Pricing varies by features; open-source Chroma/Qdrant minimize software cost but add hosting; managed Pinecone is typically more expensive than Weaviate for equivalent service levels.

Can I migrate vectors between stores?

Yes. Export vectors and metadata from one store and upsert into another; preserve IDs and embedding model to maintain nearest-neighbor behavior.