home / skills / orchestra-research / ai-research-skills / sentence-transformers

sentence-transformers skill

/15-rag/sentence-transformers

This skill helps generate high-quality embeddings for semantic search and retrieval using sentence-transformers, enabling efficient RAG, clustering, and

npx playbooks add skill orchestra-research/ai-research-skills --skill sentence-transformers

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
6.2 KB
---
name: sentence-transformers
description: Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Sentence Transformers, Embeddings, Semantic Similarity, RAG, Multilingual, Multimodal, Pre-Trained Models, Clustering, Semantic Search, Production]
dependencies: [sentence-transformers, transformers, torch]
---

# Sentence Transformers - State-of-the-Art Embeddings

Python framework for sentence and text embeddings using transformers.

## When to use Sentence Transformers

**Use when:**
- Need high-quality embeddings for RAG
- Semantic similarity and search
- Text clustering and classification
- Multilingual embeddings (100+ languages)
- Running embeddings locally (no API)
- Cost-effective alternative to OpenAI embeddings

**Metrics**:
- **15,700+ GitHub stars**
- **5000+ pre-trained models**
- **100+ languages** supported
- Based on PyTorch/Transformers

**Use alternatives instead**:
- **OpenAI Embeddings**: Need API-based, highest quality
- **Instructor**: Task-specific instructions
- **Cohere Embed**: Managed service

## Quick start

### Installation

```bash
pip install sentence-transformers
```

### Basic usage

```python
from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = [
    "This is an example sentence",
    "Each sentence is converted to a vector"
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 384)

# Cosine similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
```

## Popular models

### General purpose

```python
# Fast, good quality (384 dim)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Better quality (768 dim)
model = SentenceTransformer('all-mpnet-base-v2')

# Best quality (1024 dim, slower)
model = SentenceTransformer('all-roberta-large-v1')
```

### Multilingual

```python
# 50+ languages
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

# 100+ languages
model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')
```

### Domain-specific

```python
# Legal domain
model = SentenceTransformer('nlpaueb/legal-bert-base-uncased')

# Scientific papers
model = SentenceTransformer('allenai/specter')

# Code
model = SentenceTransformer('microsoft/codebert-base')
```

## Semantic search

```python
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

# Corpus
corpus = [
    "Python is a programming language",
    "Machine learning uses algorithms",
    "Neural networks are powerful"
]

# Encode corpus
corpus_embeddings = model.encode(corpus, convert_to_tensor=True)

# Query
query = "What is Python?"
query_embedding = model.encode(query, convert_to_tensor=True)

# Find most similar
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=3)
print(hits)
```

## Similarity computation

```python
# Cosine similarity
similarity = util.cos_sim(embedding1, embedding2)

# Dot product
similarity = util.dot_score(embedding1, embedding2)

# Pairwise cosine similarity
similarities = util.cos_sim(embeddings, embeddings)
```

## Batch encoding

```python
# Efficient batch processing
sentences = ["sentence 1", "sentence 2", ...] * 1000

embeddings = model.encode(
    sentences,
    batch_size=32,
    show_progress_bar=True,
    convert_to_tensor=False  # or True for PyTorch tensors
)
```

## Fine-tuning

```python
from sentence_transformers import InputExample, losses
from torch.utils.data import DataLoader

# Training data
train_examples = [
    InputExample(texts=['sentence 1', 'sentence 2'], label=0.8),
    InputExample(texts=['sentence 3', 'sentence 4'], label=0.3),
]

train_dataloader = DataLoader(train_examples, batch_size=16)

# Loss function
train_loss = losses.CosineSimilarityLoss(model)

# Train
model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=10,
    warmup_steps=100
)

# Save
model.save('my-finetuned-model')
```

## LangChain integration

```python
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Use with vector stores
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings
)
```

## LlamaIndex integration

```python
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

from llama_index.core import Settings
Settings.embed_model = embed_model

# Use in index
index = VectorStoreIndex.from_documents(documents)
```

## Model selection guide

| Model | Dimensions | Speed | Quality | Use Case |
|-------|------------|-------|---------|----------|
| all-MiniLM-L6-v2 | 384 | Fast | Good | General, prototyping |
| all-mpnet-base-v2 | 768 | Medium | Better | Production RAG |
| all-roberta-large-v1 | 1024 | Slow | Best | High accuracy needed |
| paraphrase-multilingual | 768 | Medium | Good | Multilingual |

## Best practices

1. **Start with all-MiniLM-L6-v2** - Good baseline
2. **Normalize embeddings** - Better for cosine similarity
3. **Use GPU if available** - 10× faster encoding
4. **Batch encoding** - More efficient
5. **Cache embeddings** - Expensive to recompute
6. **Fine-tune for domain** - Improves quality
7. **Test different models** - Quality varies by task
8. **Monitor memory** - Large models need more RAM

## Performance

| Model | Speed (sentences/sec) | Memory | Dimension |
|-------|----------------------|---------|-----------|
| MiniLM | ~2000 | 120MB | 384 |
| MPNet | ~600 | 420MB | 768 |
| RoBERTa | ~300 | 1.3GB | 1024 |

## Resources

- **GitHub**: https://github.com/UKPLab/sentence-transformers ⭐ 15,700+
- **Models**: https://huggingface.co/sentence-transformers
- **Docs**: https://www.sbert.net
- **License**: Apache 2.0


Overview

This skill provides the sentence-transformers framework for generating state-of-the-art sentence, text, and image embeddings. It exposes 5,000+ pre-trained models covering multilingual, domain-specific, and multimodal use cases, optimized for semantic similarity, clustering, and retrieval. Ideal for production embedding pipelines where local, cost-effective inference and fine-tuning are required.

How this skill works

The library wraps transformer models to produce fixed-size vector embeddings for sentences or documents, using PyTorch and Hugging Face transformers under the hood. It offers utilities for efficient batch encoding, cosine/dot similarity, semantic search, and integration adapters for LangChain and LlamaIndex. Models range from compact, fast MiniLM variants to high-accuracy large RoBERTa/MPNet models, and you can fine-tune models using contrastive or similarity losses.

When to use it

  • Building retrieval-augmented generation (RAG) systems with local embeddings
  • Implementing semantic search and nearest-neighbor retrieval over text or multimodal corpora
  • Clustering, semantic similarity, or duplicate detection across documents
  • Producing multilingual embeddings (100+ languages) for global applications
  • Fine-tuning embeddings for domain-specific tasks (legal, scientific, code)

Best practices

  • Start with all-MiniLM-L6-v2 as a fast, strong baseline and iterate to larger models if needed
  • Normalize and optionally L2-normalize embeddings when using cosine similarity
  • Encode in batches and use GPU for large-scale workloads to maximize throughput
  • Cache and persist embeddings to avoid expensive recomputation
  • Fine-tune on in-domain sentence pairs or relevance labels to improve downstream accuracy
  • Monitor memory and choose model dimension based on latency and storage tradeoffs

Example use cases

  • RAG pipeline: encode documents to build a vector store for retrieval before generation
  • Semantic search: return nearest passages to user queries using cosine similarity
  • Document clustering: group documents or support tickets by semantic content
  • Duplicate detection: find near-duplicate or paraphrased content at scale
  • Domain adaptation: fine-tune a base model on legal or scientific corpora for higher precision

FAQ

Do I need an external API to use sentence-transformers?

No. Models run locally with PyTorch and Hugging Face; you can generate embeddings without a paid API.

Which model should I pick for production?

Start with all-mpnet-base-v2 for a good production balance of quality and cost; use MiniLM for low-latency needs and roberta-large variants for highest accuracy.