home / skills / orchestra-research / ai-research-skills / pinecone

pinecone skill

Q: Does Pinecone handle scaling and replication?

Yes. Pinecone is fully managed and auto-scales; use serverless for automatic scaling or pod-based specs for consistent performance.

Q: How do I combine dense and sparse signals?

Use hybrid upserts with both dense values and sparse_values, then query with a sparse_vector and alpha blending to control dense/sparse weighting.

Q: Can I filter results by metadata?

Yes. Pinecone supports exact matches, comparisons ($gt, $lt, etc.), logical operators ($and/$or) and $in for lists in query filters.

safe

/15-rag/pinecone

This skill helps you manage production-grade vector search with Pinecone, delivering low-latency, serverless indexing and hybrid search capabilities.

npx playbooks add skill orchestra-research/ai-research-skills --skill pinecone

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

7.6 KB

---
name: pinecone
description: Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations]
dependencies: [pinecone-client]
---

# Pinecone - Managed Vector Database

The vector database for production AI applications.

## When to use Pinecone

**Use when:**
- Need managed, serverless vector database
- Production RAG applications
- Auto-scaling required
- Low latency critical (<100ms)
- Don't want to manage infrastructure
- Need hybrid search (dense + sparse vectors)

**Metrics**:
- Fully managed SaaS
- Auto-scales to billions of vectors
- **p95 latency <100ms**
- 99.9% uptime SLA

**Use alternatives instead**:
- **Chroma**: Self-hosted, open-source
- **FAISS**: Offline, pure similarity search
- **Weaviate**: Self-hosted with more features

## Quick start

### Installation

```bash
pip install pinecone-client
```

### Basic usage

```python
from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="my-index",
    dimension=1536,  # Must match embedding dimension
    metric="cosine",  # or "euclidean", "dotproduct"
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

# Connect to index
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
    {"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
])

# Query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    include_metadata=True
)

print(results["matches"])
```

## Core operations

### Create index

```python
# Serverless (recommended)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",         # or "gcp", "azure"
        region="us-east-1"
    )
)

# Pod-based (for consistent performance)
from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1"
    )
)
```

### Upsert vectors

```python
# Single upsert
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # 1536 dimensions
        "metadata": {
            "text": "Document content",
            "category": "tutorial",
            "timestamp": "2025-01-01"
        }
    }
])

# Batch upsert (recommended)
vectors = [
    {"id": f"vec{i}", "values": embedding, "metadata": metadata}
    for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
]

index.upsert(vectors=vectors, batch_size=100)
```

### Query vectors

```python
# Basic query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=10,
    include_metadata=True,
    include_values=False
)

# With metadata filtering
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    filter={"category": {"$eq": "tutorial"}}
)

# Namespace query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    namespace="production"
)

# Access results
for match in results["matches"]:
    print(f"ID: {match['id']}")
    print(f"Score: {match['score']}")
    print(f"Metadata: {match['metadata']}")
```

### Metadata filtering

```python
# Exact match
filter = {"category": "tutorial"}

# Comparison
filter = {"price": {"$gte": 100}}  # $gt, $gte, $lt, $lte, $ne

# Logical operators
filter = {
    "$and": [
        {"category": "tutorial"},
        {"difficulty": {"$lte": 3}}
    ]
}  # Also: $or

# In operator
filter = {"tags": {"$in": ["python", "ml"]}}
```

## Namespaces

```python
# Partition data by namespace
index.upsert(
    vectors=[{"id": "vec1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])
```

## Hybrid search (dense + sparse)

```python
# Upsert with sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense vector
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=5,
    alpha=0.5  # 0=sparse, 1=dense, 0.5=hybrid
)
```

## LangChain integration

```python
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

# Create vector store
vectorstore = PineconeVectorStore.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    index_name="my-index"
)

# Query
results = vectorstore.similarity_search("query", k=5)

# With metadata filter
results = vectorstore.similarity_search(
    "query",
    k=5,
    filter={"category": "tutorial"}
)

# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
```

## LlamaIndex integration

```python
from llama_index.vector_stores.pinecone import PineconeVectorStore

# Connect to Pinecone
pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")

# Create vector store
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# Use in LlamaIndex
from llama_index.core import StorageContext, VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
```

## Index management

```python
# List indices
indexes = pc.list_indexes()

# Describe index
index_info = pc.describe_index("my-index")
print(index_info)

# Get index stats
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Namespaces: {stats['namespaces']}")

# Delete index
pc.delete_index("my-index")
```

## Delete vectors

```python
# Delete by ID
index.delete(ids=["vec1", "vec2"])

# Delete by filter
index.delete(filter={"category": "old"})

# Delete all in namespace
index.delete(delete_all=True, namespace="test")

# Delete entire index
index.delete(delete_all=True)
```

## Best practices

1. **Use serverless** - Auto-scaling, cost-effective
2. **Batch upserts** - More efficient (100-200 per batch)
3. **Add metadata** - Enable filtering
4. **Use namespaces** - Isolate data by user/tenant
5. **Monitor usage** - Check Pinecone dashboard
6. **Optimize filters** - Index frequently filtered fields
7. **Test with free tier** - 1 index, 100K vectors free
8. **Use hybrid search** - Better quality
9. **Set appropriate dimensions** - Match embedding model
10. **Regular backups** - Export important data

## Performance

| Operation | Latency | Notes |
|-----------|---------|-------|
| Upsert | ~50-100ms | Per batch |
| Query (p50) | ~50ms | Depends on index size |
| Query (p95) | ~100ms | SLA target |
| Metadata filter | ~+10-20ms | Additional overhead |

## Pricing (as of 2025)

**Serverless**:
- $0.096 per million read units
- $0.06 per million write units
- $0.06 per GB storage/month

**Free tier**:
- 1 serverless index
- 100K vectors (1536 dimensions)
- Great for prototyping

## Resources

- **Website**: https://www.pinecone.io
- **Docs**: https://docs.pinecone.io
- **Console**: https://app.pinecone.io
- **Pricing**: https://www.pinecone.io/pricing

Overview

This skill integrates Pinecone, a fully managed vector database, into AI applications for production-grade retrieval and similarity search. It emphasizes serverless, auto-scaling infrastructure with low-latency queries and support for hybrid dense+sparse search. Use it to power RAG, recommendations, and semantic search without managing servers.

How this skill works

The skill connects your agent to Pinecone indexes, enabling creation, upsert, query, and index management operations via the Pinecone client. It supports namespaces and metadata filtering, hybrid search (dense vectors plus sparse token vectors), and batch operations to optimize throughput. Latency and scaling are handled by Pinecone so the agent focuses on orchestration and retrieval.

When to use it

Building production RAG systems that require low latency (<100ms p95).
Deploying semantic search or recommendation systems at scale with auto-scaling needs.
When you want a serverless managed vector database and to avoid operating infrastructure.
Needing hybrid search (dense + sparse) or advanced metadata filtering and namespaces.
Integrating with LangChain or LlamaIndex for retrieval-augmented pipelines.

Best practices

Prefer serverless indexes for auto-scaling and cost efficiency when possible.
Batch upserts (100–200 vectors) to maximize ingestion throughput and reduce latency.
Attach useful metadata and index frequently filtered fields to speed queries.
Use namespaces to partition per-user or per-tenant data and simplify deletes.
Match index dimension to your embedding model and test with the free tier before production.

Example use cases

RAG for customer support: store doc embeddings and fetch top context per query with metadata filters.
Personalized recommendations: keep user/item embeddings in namespaces and run fast similarity lookups.
Semantic site search: combine dense embeddings with sparse term signals for high-quality results.
Multi-tenant retrieval: isolate tenants using namespaces and enforce per-tenant filters.

FAQ

Does Pinecone handle scaling and replication?

Yes. Pinecone is fully managed and auto-scales; use serverless for automatic scaling or pod-based specs for consistent performance.

How do I combine dense and sparse signals?

Use hybrid upserts with both dense values and sparse_values, then query with a sparse_vector and alpha blending to control dense/sparse weighting.

Can I filter results by metadata?

Yes. Pinecone supports exact matches, comparisons ($gt, $lt, etc.), logical operators ($and/$or) and $in for lists in query filters.