home / skills / davila7 / claude-code-templates / rag-faiss

This skill helps you implement and optimize FAISS based vector similarity search for large scale high performance applications.

This is most likely a fork of the faiss skill from orchestra-research
npx playbooks add skill davila7/claude-code-templates --skill rag-faiss

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
4.9 KB
---
name: faiss
description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
dependencies: [faiss-cpu, faiss-gpu, numpy]
---

# FAISS - Efficient Similarity Search

Facebook AI's library for billion-scale vector similarity search.

## When to use FAISS

**Use FAISS when:**
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings

**Metrics**:
- **31,700+ GitHub stars**
- Meta/Facebook AI Research
- **Handles billions of vectors**
- **C++** with Python bindings

**Use alternatives instead**:
- **Chroma/Pinecone**: Need metadata filtering
- **Weaviate**: Need full database features
- **Annoy**: Simpler, fewer features

## Quick start

### Installation

```bash
# CPU only
pip install faiss-cpu

# GPU support
pip install faiss-gpu
```

### Basic usage

```python
import faiss
import numpy as np

# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')

# Create index
index = faiss.IndexFlatL2(d)  # L2 distance
index.add(vectors)             # Add vectors

# Search
k = 5  # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")
```

## Index types

### 1. Flat (exact search)

```python
# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)

# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)

# Slowest, most accurate
```

### 2. IVF (inverted file) - Fast approximate

```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)

# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data
index.train(vectors)

# Add vectors
index.add(vectors)

# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)
```

### 3. HNSW (Hierarchical NSW) - Best quality/speed

```python
# HNSW index
M = 32  # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)

# No training needed
index.add(vectors)

# Search
distances, indices = index.search(query, k)
```

### 4. Product Quantization - Memory efficient

```python
# PQ reduces memory by 16-32×
m = 8   # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)

# Train and add
index.train(vectors)
index.add(vectors)
```

## Save and load

```python
# Save index
faiss.write_index(index, "large.index")

# Load index
index = faiss.read_index("large.index")

# Continue using
distances, indices = index.search(query, k)
```

## GPU acceleration

```python
# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# 10-100× faster than CPU
```

## LangChain integration

```python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

# Save
vectorstore.save_local("faiss_index")

# Load
vectorstore = FAISS.load_local(
    "faiss_index",
    OpenAIEmbeddings(),
    allow_dangerous_deserialization=True
)

# Search
results = vectorstore.similarity_search("query", k=5)
```

## LlamaIndex integration

```python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)
```

## Best practices

1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
3. **Use GPU for large datasets** - 10-100× faster
4. **Save trained indices** - Training is expensive
5. **Tune nprobe/ef_search** - Balance speed/accuracy
6. **Monitor memory** - PQ for large datasets
7. **Batch queries** - Better GPU utilization

## Performance

| Index Type | Build Time | Search Time | Memory | Accuracy |
|------------|------------|-------------|--------|----------|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |

## Resources

- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **License**: MIT


Overview

This skill exposes FAISS, Facebook’s high-performance library for similarity search and clustering of dense vectors. It focuses on fast k-NN retrieval at scale, offering exact and approximate indexes (Flat, IVF, HNSW, PQ) and GPU acceleration for low-latency, high-throughput applications. Use it when you need pure vector search without metadata management.

How this skill works

FAISS builds and queries vector indexes optimized for different trade-offs between speed, memory, and accuracy. It supports exact search (Flat), cluster-based approximate search (IVF), graph-based search (HNSW), and quantization (PQ) for memory reduction. Indexes can be trained, saved, loaded, and moved to GPUs for large-scale acceleration.

When to use it

  • You have millions to billions of dense vectors and need sub-second nearest-neighbor searches.
  • You require GPU acceleration to reduce search latency or increase throughput.
  • You only need pure vector similarity without filtering by metadata.
  • You need a production-ready library with multiple index types and tunable speed/accuracy trade-offs.
  • You want offline batch indexing and long-lived precomputed indexes.

Best practices

  • Pick the right index: Flat for small datasets, IVF for mid-size, HNSW for high quality/speed, PQ for memory-constrained sets.
  • Normalize vectors when using inner product indexes to approximate cosine similarity.
  • Train indexes (IVF, PQ) on representative samples and persist trained indexes to avoid repeated training costs.
  • Tune search parameters (nprobe for IVF, efSearch/efConstruction for HNSW) to balance recall and latency.
  • Use GPU indexes for large datasets and batch queries to maximize GPU utilization and throughput.

Example use cases

  • Semantic search over large embedding collections for document retrieval or QA systems.
  • Recommendation systems that match users and items via dense embeddings at scale.
  • Similarity-based deduplication or clustering of billions of feature vectors.
  • Real-time k-NN retrieval in low-latency services backed by GPU-accelerated indexes.
  • Embedding-backed analytics workflows where indexes are trained offline and queried in production.

FAQ

Do I need to train every FAISS index?

Not always. Flat and HNSW indexes do not require training; IVF and PQ indexes require training on representative vectors before adding data.

When should I use GPU acceleration?

Use GPUs for large datasets or when search latency and throughput are critical. GPUs can provide 10–100× speedups for many workloads, especially with batch queries.

Can FAISS store metadata or filter by fields?

FAISS focuses on raw vector similarity and does not natively handle metadata filtering. Combine FAISS with an external store or use hybrid systems if you need attribute-based filtering.