home / skills / eyadsibai / ltk / vector-databases

This skill helps you choose and compare vector databases for embeddings, RAG retrieval, and semantic search across Chroma, FAISS, Qdrant, and Pinecone.

npx playbooks add skill eyadsibai/ltk --skill vector-databases

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.8 KB
---
name: vector-databases
description: Use when "vector database", "embedding storage", "similarity search", "semantic search", "Chroma", "ChromaDB", "FAISS", "Qdrant", "RAG retrieval", "k-NN search", "vector index", "HNSW", "IVF"
version: 1.0.0
---

# Vector Databases

Store and search embeddings for RAG, semantic search, and similarity applications.

## Comparison

| Database | Best For | Filtering | Scale | Managed Option |
|----------|----------|-----------|-------|----------------|
| **Chroma** | Local dev, prototyping | Yes | < 1M | No |
| **FAISS** | Max speed, GPU, batch | No | Billions | No |
| **Qdrant** | Production, hybrid search | Yes | Millions | Yes |
| **Pinecone** | Fully managed | Yes | Billions | Yes (only) |
| **Weaviate** | Hybrid search, GraphQL | Yes | Millions | Yes |

---

## Chroma

Embedded vector database for prototyping. No server needed.

**Strengths**: Zero-config, auto-embedding, metadata filtering, persistent storage
**Limitations**: Not for production scale, single-node only

**Key concept**: Collections hold documents + embeddings + metadata. Auto-embeds text if no vectors provided.

---

## FAISS (Facebook AI)

Pure vector similarity - no metadata, no filtering, maximum speed.

**Index types:**

- **Flat**: Exact search, small datasets (< 10K)
- **IVF**: Inverted file, medium datasets (10K - 1M)
- **HNSW**: Graph-based, good recall/speed tradeoff
- **PQ**: Product quantization, memory efficient for billions

**Strengths**: Fastest, GPU support, scales to billions
**Limitations**: No filtering, no metadata, vectors only

**Key concept**: Choose index based on dataset size. Trade accuracy for speed with approximate search.

---

## Qdrant

Production-ready with rich filtering and hybrid search.

**Strengths**: Payload filtering, horizontal scaling, cloud option, gRPC API
**Limitations**: More complex setup than Chroma

**Key concept**: "Payloads" are metadata attached to vectors. Filter during search, not after.

---

## Index Algorithm Concepts

| Algorithm | How It Works | Trade-off |
|-----------|--------------|-----------|
| **Flat** | Compare to every vector | Perfect recall, slow |
| **IVF** | Cluster vectors, search nearby clusters | Good recall, fast |
| **HNSW** | Graph of neighbors | Best recall/speed ratio |
| **PQ** | Compress vectors | Memory efficient, lower recall |

---

## Decision Guide

| Requirement | Recommendation |
|-------------|----------------|
| Quick prototype | Chroma |
| Metadata filtering | Chroma, Qdrant, Pinecone |
| Billions of vectors | FAISS |
| GPU acceleration | FAISS |
| Production deployment | Qdrant or Pinecone |
| Fully managed | Pinecone |
| On-premise control | Qdrant, Chroma |

## Resources

- Chroma: <https://docs.trychroma.com>
- FAISS: <https://github.com/facebookresearch/faiss>
- Qdrant: <https://qdrant.tech/documentation/>
- Pinecone: <https://docs.pinecone.io>

Overview

This skill helps select, design, and operate vector databases for embeddings, semantic search, and retrieval-augmented generation (RAG). It summarizes trade-offs between popular engines (Chroma, FAISS, Qdrant, Pinecone, Weaviate) and gives concrete guidance for index choice, filtering, scaling, and deployment.

How this skill works

The skill inspects use-case requirements (dataset size, need for metadata filtering, latency, GPU availability, and managed vs self-hosted preference) and maps them to appropriate engines and index algorithms (Flat, IVF, HNSW, PQ). It explains where to run embedding storage locally for prototyping or choose managed/cloud options for production and how payloads/metadata interplay with search filters and hybrid ranking.

When to use it

  • Prototyping local semantic search or RAG workflows with small to medium datasets
  • High-throughput, GPU-accelerated similarity search over billions of vectors
  • Production deployments requiring metadata filtering, horizontal scaling, or hybrid search
  • When you need a fully managed vector service for reduced ops burden
  • Selecting an index algorithm based on dataset size, recall vs latency trade-offs

Best practices

  • Start with Chroma for zero-config local development; migrate to a production engine as scale and feature needs grow
  • Choose index type by size: Flat for tiny sets, IVF for medium, HNSW for balanced recall/latency, PQ for extreme scale
  • Model embeddings and metadata together: use payloads/filters when you need attribute-aware retrieval rather than post-filtering
  • Use FAISS with GPU for maximum throughput; accept lack of filtering or pair with an external metadata store if needed
  • Prefer managed services (Pinecone, Qdrant cloud) to reduce ops for production, but choose Qdrant or on-prem options when control or hybrid search is required

Example use cases

  • Local development: Chroma for quick semantic search demos and persistent prototyping
  • High-scale vector search: FAISS with IVF/PQ and GPU for billion-vector nearest neighbor tasks
  • Production RAG: Qdrant for payload filtering, horizontal scaling, and hybrid search with metadata
  • Managed deployment: Pinecone for teams that want a hosted, scalable vector index with built-in filtering
  • Hybrid search or GraphQL-driven apps: Weaviate when semantic vectors must integrate with structured graph queries

FAQ

Which engine should I pick for a small prototype?

Use Chroma for zero-config local work and easy embedding auto-inference. It’s lightweight and persistent but not for large-scale production.

How do I handle metadata filtering with FAISS?

FAISS is vectors-only. Combine FAISS for nearest-neighbor retrieval with a secondary metadata store or use an engine like Qdrant or Pinecone that supports payload filtering natively.