home / skills / yonatangross / orchestkit / embeddings

embeddings skill

safe

/plugins/ork/skills/embeddings

This skill converts text to vector embeddings for semantic search, helping you select models, chunk content, and build similarity features.

npx playbooks add skill yonatangross/orchestkit --skill embeddings

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

4.3 KB

---
name: embeddings
description: Text embeddings for semantic search and similarity. Use when converting text to vectors, choosing embedding models, implementing chunking strategies, or building document similarity features.
tags: [ai, embeddings, vectors, semantic-search, similarity]
context: fork
agent: data-pipeline-engineer
version: 1.0.0
author: OrchestKit
user-invocable: false
---

# Embeddings

Convert text to dense vector representations for semantic search and similarity.

## Quick Reference

```python
from openai import OpenAI

client = OpenAI()

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here"
)
vector = response.data[0].embedding  # 1536 dimensions
```

```python
# Batch embedding (efficient)
texts = ["text1", "text2", "text3"]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)
vectors = [item.embedding for item in response.data]
```

## Model Selection

| Model | Dims | Cost | Use Case |
|-------|------|------|----------|
| `text-embedding-3-small` | 1536 | $0.02/1M | General purpose |
| `text-embedding-3-large` | 3072 | $0.13/1M | High accuracy |
| `nomic-embed-text` (Ollama) | 768 | Free | Local/CI |

## Chunking Strategy

```python
def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks for embedding."""
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk:
            chunks.append(chunk)

    return chunks
```

**Guidelines:**
- Chunk size: 256-1024 tokens (512 typical)
- Overlap: 10-20% for context continuity
- Include metadata (title, source) with chunks

## Similarity Calculation

```python
import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage
similarity = cosine_similarity(vector1, vector2)
# 1.0 = identical, 0.0 = orthogonal, -1.0 = opposite
```

## Key Decisions

- **Dimension reduction**: Can truncate `text-embedding-3-large` to 1536 dims
- **Normalization**: Most models return normalized vectors
- **Batch size**: 100-500 texts per API call for efficiency

## Common Mistakes

- Embedding queries differently than documents
- Not chunking long documents (context gets lost)
- Using wrong similarity metric (cosine vs euclidean)
- Re-embedding unchanged content (cache embeddings)

## Advanced Patterns

See `references/advanced-patterns.md` for:
- **Late Chunking**: Embed full document, extract chunk vectors from contextualized tokens
- **Batch API**: Production batching with rate limiting and retry
- **Embedding Cache**: Redis-based caching to avoid re-embedding
- **Matryoshka Embeddings**: Dimension reduction with text-embedding-3

## Related Skills

- `rag-retrieval` - Using embeddings for RAG pipelines
- `hyde-retrieval` - Hypothetical document embeddings for vocabulary mismatch
- `contextual-retrieval` - Anthropic's context-prepending technique
- `reranking-patterns` - Cross-encoder reranking for precision
- `ollama-local` - Local embeddings with nomic-embed-text

## Capability Details

### text-to-vector
**Keywords:** embedding, text to vector, vectorize, embed text
**Solves:**
- Convert text to vector embeddings
- Choose appropriate embedding models
- Handle embedding API integration

### semantic-search
**Keywords:** semantic search, vector search, similarity search, find similar
**Solves:**
- Implement semantic search over documents
- Configure similarity thresholds
- Rank results by relevance

### chunking-strategies
**Keywords:** chunk, chunking, split, text splitting, overlap
**Solves:**
- Split documents into optimal chunks
- Configure chunk size and overlap
- Preserve semantic boundaries

### batch-embedding
**Keywords:** batch, bulk embed, parallel embedding, batch processing
**Solves:**
- Embed large document collections efficiently
- Handle rate limits and retries
- Optimize embedding costs

### local-embeddings
**Keywords:** local, ollama, self-hosted, on-premise, offline
**Solves:**
- Run embeddings locally with Ollama
- Deploy self-hosted embedding models
- Reduce API costs with local models

Overview

This skill provides practical patterns and utilities for converting text into dense vector embeddings for semantic search and similarity tasks. It covers model selection, chunking strategies, batching, local embedding options, and similarity calculations. The goal is to help engineers build reliable, cost-effective vector pipelines for retrieval-augmented generation and document similarity features.

How this skill works

The skill explains how to call embedding APIs, choose models by dimension and cost, and batch requests for throughput and cost efficiency. It shows chunking rules with overlap to preserve context, normalization and similarity computation (cosine), and production patterns like caching and retry/backoff. It also highlights local alternatives for self-hosting embeddings and practical defaults for chunk size and batch sizes.

When to use it

Building semantic search or vector-based retrieval systems
Converting documents or queries into vectors for similarity ranking
Designing chunking and overlap strategies for long documents
Optimizing embedding throughput and API costs with batching
Deploying local or on-prem embeddings to reduce cloud costs or meet compliance

Best practices

Choose model by accuracy vs cost: small for general use, large for higher precision
Chunk documents 256–1024 tokens (512 typical) with 10–20% overlap to keep context
Batch 100–500 texts per API call and implement retries with rate-limit handling
Cache embeddings (e.g., Redis) and avoid re-embedding unchanged content
Normalize vectors and use cosine similarity for ranking; profile alternatives if needed

Example use cases

Semantic search over a document corpus with retrieval-augmented generation (RAG)
Document similarity and near-duplicate detection for deduplication
Chunking long reports and indexing chunk vectors for fast passage retrieval
Batch embedding a large dataset with retry and rate-limit handling for nightly jobs
Using a local Ollama/nomic-embed-text model for CI or self-hosted inference

FAQ

Which embedding model should I pick for a new project?

Start with a general-purpose small model for prototyping to reduce cost. Move to a larger model if you need higher accuracy or better semantic nuance and budget allows.

How do I choose chunk size and overlap?

Aim for 256–1024 tokens per chunk; 512 is a good default. Use 10–20% overlap to preserve sentence context across chunk boundaries.

How can I avoid excessive API cost?

Batch requests, cache embeddings for unchanged content, consider truncating or dimension-reducing large vectors, and evaluate local/self-hosted models for heavy workloads.