home / skills / dexploarer / hyper-forge / knowledge-base-builder
This skill helps you build production-ready elizaOS knowledge bases with RAG, embeddings, and semantic search.
npx playbooks add skill dexploarer/hyper-forge --skill knowledge-base-builderReview the files below or copy the command above to add this skill to your agents.
---
name: knowledge-base-builder
description: Create and optimize elizaOS knowledge bases with RAG, embeddings, and semantic search. Triggers on "create knowledge base", "build RAG system", or "setup agent knowledge"
allowed-tools: [Write, Read, Edit, Grep, Glob, Bash]
---
# Knowledge Base Builder Skill
Build production-ready knowledge bases for elizaOS agents with document ingestion, embeddings, and semantic retrieval.
## When to Use
- "Create a knowledge base for [domain]"
- "Build RAG system with [documents]"
- "Setup agent knowledge from [sources]"
- "Implement semantic search for agent"
## Capabilities
1. š Document ingestion (markdown, PDF, text)
2. āļø Smart chunking strategies
3. š Embedding generation
4. šļø Vector storage configuration
5. šÆ Semantic search optimization
6. š Knowledge updates and versioning
7. š Knowledge quality metrics
## Workflow
### Phase 1: Knowledge Requirements
**Questions to ask:**
1. What domain expertise is needed?
2. What document sources exist?
3. How often does knowledge change?
4. What query patterns expected?
### Phase 2: Knowledge Structure
```
knowledge/
āāā {domain}/
ā āāā README.md # Overview
ā āāā core-concepts.md # Fundamental knowledge
ā āāā procedures.md # Step-by-step guides
ā āāā faq.md # Common questions
ā āāā examples.md # Use case examples
ā āāā glossary.md # Terminology
āāā embeddings/
āāā {domain}.json # Pre-computed embeddings
```
### Phase 3: Document Format
```markdown
# {Topic Title}
## Summary
{Brief overview for quick reference}
## Key Concepts
- {Concept 1}: {Definition}
- {Concept 2}: {Definition}
## Detailed Explanation
{Comprehensive information}
## Examples
```{language}
{Code or usage examples}
```
## Related Topics
- [{Topic}](./related-topic.md)
## Last Updated
{Date}
```
### Phase 4: Character Integration
```typescript
export const character: Character = {
// ... other config
knowledge: [
// Simple facts
"Core fact about {domain}",
"Important principle in {domain}",
// File references
{
path: "./knowledge/{domain}/core-concepts.md",
shared: true // Available to all agents
},
// Directory loading
{
directory: "./knowledge/{domain}",
shared: false // Agent-specific
}
],
// Configure knowledge plugin
plugins: [
'@elizaos/plugin-knowledge',
// ... other plugins
],
settings: {
// Embedding configuration
embeddingModel: 'text-embedding-3-small',
embeddingDimensions: 1536,
// Retrieval settings
knowledgeTopK: 5, // Top results to return
knowledgeMinScore: 0.7, // Minimum similarity
knowledgeDecay: 0.95, // Time decay factor
// Chunking strategy
chunkSize: 1000, // Characters per chunk
chunkOverlap: 200, // Overlap between chunks
}
};
```
### Phase 5: Chunking Strategies
**Strategy 1: Fixed Size** (simple, balanced)
```typescript
function chunkFixedSize(text: string, size: number, overlap: number): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
const end = Math.min(start + size, text.length);
chunks.push(text.slice(start, end));
start += size - overlap;
}
return chunks;
}
```
**Strategy 2: Semantic** (intelligent, context-aware)
```typescript
function chunkSemantic(text: string): string[] {
// Split on headers and sections
const sections = text.split(/\n#{1,6}\s/);
// Further split large sections
return sections.flatMap(section => {
if (section.length > 1000) {
return chunkByParagraph(section);
}
return [section];
});
}
```
**Strategy 3: Sliding Window** (comprehensive, overlapping)
```typescript
function chunkSlidingWindow(text: string, windowSize: number, step: number): string[] {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += step) {
const chunk = text.slice(i, i + windowSize);
if (chunk.trim().length > 0) {
chunks.push(chunk);
}
}
return chunks;
}
```
### Phase 6: Embedding Optimization
```typescript
// Batch embedding generation
async function generateEmbeddings(
chunks: string[],
model: string = 'text-embedding-3-small'
): Promise<number[][]> {
const batchSize = 100;
const embeddings: number[][] = [];
for (let i = 0; i < chunks.length; i += batchSize) {
const batch = chunks.slice(i, i + batchSize);
const response = await openai.embeddings.create({
model,
input: batch,
});
embeddings.push(...response.data.map(d => d.embedding));
}
return embeddings;
}
```
### Phase 7: Search Implementation
```typescript
// Semantic search with hybrid ranking
async function searchKnowledge(
query: string,
runtime: IAgentRuntime,
topK: number = 5
): Promise<Memory[]> {
// Generate query embedding
const queryEmbedding = await generateEmbedding(query);
// Semantic search
const semanticResults = await runtime.searchMemories({
embedding: queryEmbedding,
limit: topK * 2,
minScore: 0.7
});
// Keyword search
const keywordResults = await runtime.searchMemories({
query,
limit: topK * 2
});
// Merge and rank results
return mergeAndRank(semanticResults, keywordResults, topK);
}
```
### Phase 8: Quality Metrics
```typescript
interface KnowledgeMetrics {
totalDocuments: number;
totalChunks: number;
avgChunkSize: number;
embeddingCoverage: number;
queryPerformance: {
avgLatency: number;
avgRelevance: number;
};
}
async function assessKnowledgeQuality(
runtime: IAgentRuntime
): Promise<KnowledgeMetrics> {
// Implementation
return {
totalDocuments: 50,
totalChunks: 500,
avgChunkSize: 800,
embeddingCoverage: 0.98,
queryPerformance: {
avgLatency: 150, // ms
avgRelevance: 0.85
}
};
}
```
## Best Practices
1. **Document Structure**: Use clear headers and sections
2. **Chunk Size**: Balance between context and precision (500-1500 chars)
3. **Overlap**: Include 10-20% overlap for context preservation
4. **Updates**: Version knowledge files with dates
5. **Quality**: Regular review and refinement
6. **Performance**: Pre-compute embeddings when possible
7. **Privacy**: Never include sensitive data in knowledge base
8. **Organization**: Group related documents in directories
9. **Testing**: Validate retrieval quality with test queries
10. **Monitoring**: Track usage patterns and relevance scores
This skill builds and optimizes elizaOS knowledge bases using document ingestion, embeddings, and semantic retrieval for RAG-enabled agents. It guides structure, chunking, embedding generation, vector storage, and search tuning to make agent knowledge production-ready. The goal is reliable, fast, and maintainable semantic search for agent conversations and tools.
It ingests documents (Markdown, PDF, text), applies chunking strategies (fixed, semantic, sliding window), and generates embeddings in batches. Chunks and embeddings are stored in a vector store and served via semantic search with hybrid ranking (embeddings + keyword). It includes tools for ongoing updates, versioning, and quality metrics to monitor relevance and latency.
What chunking strategy should I choose?
Start with fixed-size chunks (balanced) and evaluate relevance. Use semantic chunking for structured docs and sliding windows when full context overlap improves retrieval.
How do I balance performance and accuracy?
Pre-compute embeddings, tune top-K and min-score thresholds, and use hybrid ranking (embedding + keyword) to improve precision with lower latency.