home / skills / yonatangross / orchestkit / embeddings
/plugins/ork/skills/embeddings
This skill converts text to vector embeddings for semantic search, helping you select models, chunk content, and build similarity features.
npx playbooks add skill yonatangross/orchestkit --skill embeddingsReview the files below or copy the command above to add this skill to your agents.
---
name: embeddings
description: Text embeddings for semantic search and similarity. Use when converting text to vectors, choosing embedding models, implementing chunking strategies, or building document similarity features.
tags: [ai, embeddings, vectors, semantic-search, similarity]
context: fork
agent: data-pipeline-engineer
version: 1.0.0
author: OrchestKit
user-invocable: false
---
# Embeddings
Convert text to dense vector representations for semantic search and similarity.
## Quick Reference
```python
from openai import OpenAI
client = OpenAI()
# Single text embedding
response = client.embeddings.create(
model="text-embedding-3-small",
input="Your text here"
)
vector = response.data[0].embedding # 1536 dimensions
```
```python
# Batch embedding (efficient)
texts = ["text1", "text2", "text3"]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
vectors = [item.embedding for item in response.data]
```
## Model Selection
| Model | Dims | Cost | Use Case |
|-------|------|------|----------|
| `text-embedding-3-small` | 1536 | $0.02/1M | General purpose |
| `text-embedding-3-large` | 3072 | $0.13/1M | High accuracy |
| `nomic-embed-text` (Ollama) | 768 | Free | Local/CI |
## Chunking Strategy
```python
def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50) -> list[str]:
"""Split text into overlapping chunks for embedding."""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
if chunk:
chunks.append(chunk)
return chunks
```
**Guidelines:**
- Chunk size: 256-1024 tokens (512 typical)
- Overlap: 10-20% for context continuity
- Include metadata (title, source) with chunks
## Similarity Calculation
```python
import numpy as np
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Calculate cosine similarity between two vectors."""
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Usage
similarity = cosine_similarity(vector1, vector2)
# 1.0 = identical, 0.0 = orthogonal, -1.0 = opposite
```
## Key Decisions
- **Dimension reduction**: Can truncate `text-embedding-3-large` to 1536 dims
- **Normalization**: Most models return normalized vectors
- **Batch size**: 100-500 texts per API call for efficiency
## Common Mistakes
- Embedding queries differently than documents
- Not chunking long documents (context gets lost)
- Using wrong similarity metric (cosine vs euclidean)
- Re-embedding unchanged content (cache embeddings)
## Advanced Patterns
See `references/advanced-patterns.md` for:
- **Late Chunking**: Embed full document, extract chunk vectors from contextualized tokens
- **Batch API**: Production batching with rate limiting and retry
- **Embedding Cache**: Redis-based caching to avoid re-embedding
- **Matryoshka Embeddings**: Dimension reduction with text-embedding-3
## Related Skills
- `rag-retrieval` - Using embeddings for RAG pipelines
- `hyde-retrieval` - Hypothetical document embeddings for vocabulary mismatch
- `contextual-retrieval` - Anthropic's context-prepending technique
- `reranking-patterns` - Cross-encoder reranking for precision
- `ollama-local` - Local embeddings with nomic-embed-text
## Capability Details
### text-to-vector
**Keywords:** embedding, text to vector, vectorize, embed text
**Solves:**
- Convert text to vector embeddings
- Choose appropriate embedding models
- Handle embedding API integration
### semantic-search
**Keywords:** semantic search, vector search, similarity search, find similar
**Solves:**
- Implement semantic search over documents
- Configure similarity thresholds
- Rank results by relevance
### chunking-strategies
**Keywords:** chunk, chunking, split, text splitting, overlap
**Solves:**
- Split documents into optimal chunks
- Configure chunk size and overlap
- Preserve semantic boundaries
### batch-embedding
**Keywords:** batch, bulk embed, parallel embedding, batch processing
**Solves:**
- Embed large document collections efficiently
- Handle rate limits and retries
- Optimize embedding costs
### local-embeddings
**Keywords:** local, ollama, self-hosted, on-premise, offline
**Solves:**
- Run embeddings locally with Ollama
- Deploy self-hosted embedding models
- Reduce API costs with local models
This skill provides practical patterns and utilities for converting text into dense vector embeddings for semantic search and similarity tasks. It covers model selection, chunking strategies, batching, local embedding options, and similarity calculations. The goal is to help engineers build reliable, cost-effective vector pipelines for retrieval-augmented generation and document similarity features.
The skill explains how to call embedding APIs, choose models by dimension and cost, and batch requests for throughput and cost efficiency. It shows chunking rules with overlap to preserve context, normalization and similarity computation (cosine), and production patterns like caching and retry/backoff. It also highlights local alternatives for self-hosting embeddings and practical defaults for chunk size and batch sizes.
Which embedding model should I pick for a new project?
Start with a general-purpose small model for prototyping to reduce cost. Move to a larger model if you need higher accuracy or better semantic nuance and budget allows.
How do I choose chunk size and overlap?
Aim for 256–1024 tokens per chunk; 512 is a good default. Use 10–20% overlap to preserve sentence context across chunk boundaries.
How can I avoid excessive API cost?
Batch requests, cache embeddings for unchanged content, consider truncating or dimension-reducing large vectors, and evaluate local/self-hosted models for heavy workloads.