home / skills / lin-a1 / skills-agent / rag_service

rag_service skill

/services/rag_service

This skill enables fast, multi-path semantic search with Milvus, rerank precision, and scalable document storage for large-scale knowledge retrieval.

npx playbooks add skill lin-a1/skills-agent --skill rag_service

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.4 KB
---
name: rag-service
description: 高性能 RAG 多路检索服务。集成 Milvus 向量数据库进行语义检索,并结合 Rerank 模型进行精准重排序,支持海量文档的高效存储与历史内容召回。
---

## 功能
RAG 多路检索服务,提供:
1. 向量语义检索 - 基于 Milvus 的向量相似度搜索
2. Rerank 重排序 - 对检索结果进行精排
3. 文档存储 - 保存文档到向量数据库

## 调用方式
```python
from services.rag_service.client import RAGServiceClient

client = RAGServiceClient()

# 健康检查
status = client.health()

# 语义检索
result = client.retrieve(
    query="Python 异步编程最佳实践",
    top_k=5,
    min_score=0.85,
    rerank=True
)
print(result["results"])

# 便捷方法:只获取文本列表
texts = client.retrieve_texts(query="Python 异步编程", top_k=5)

# 保存文档
client.save(documents=[
    {"text": "文档内容...", "metadata": {"title": "标题", "url": "..."}}
])
```

## 返回格式

### retrieve
```json
{
  "query": "Python 异步编程",
  "results": [
    {
      "id": "abc123",
      "text": "Python异步编程基于asyncio库...",
      "score": 0.92,
      "metadata": {"title": "Python官方文档", "url": "..."}
    }
  ],
  "total": 3,
  "elapsed_ms": 45.2,
  "from_cache": false
}
```

### save
```json
{
  "saved_count": 5,
  "collection_name": "websearch_results"
}
```

Overview

This skill provides a high-performance RAG (retrieval-augmented generation) multi-route retrieval service. It integrates Milvus for vector semantic search, applies a Rerank model for precise result ordering, and offers scalable document storage and history recall for large collections. The service is designed for production-grade retrieval tasks with low latency and high accuracy.

How this skill works

The service embeds documents into vector space and stores them in Milvus for fast approximate nearest neighbor search. A configurable retrieval step returns top-K candidates which are optionally passed to a Rerank model that refines ordering based on relevance. It also exposes document ingestion and health-check endpoints and returns structured results including scores, metadata, and timing.

When to use it

  • Enable semantic search across large unstructured document sets
  • Combine retrieval with generation to ground LLM outputs in source texts
  • Require result reordering for improved precision in top results
  • Store and recall document history alongside embeddings
  • Optimize latency and throughput for production retrieval pipelines

Best practices

  • Tune embedding model and Milvus index parameters for your dataset size
  • Set sensible top_k and min_score thresholds to balance recall and precision
  • Enable rerank when top result quality matters, especially for short queries
  • Include metadata (title, url, tags) to aid downstream filtering and presentation
  • Monitor elapsed_ms and from_cache flags to identify performance bottlenecks

Example use cases

  • Provide grounded answers in a customer support assistant by retrieving relevant KB articles
  • Augment a summarization pipeline with precise source excerpts for provenance
  • Search and surface legal, medical, or research documents by semantic similarity
  • Maintain an evolving document corpus for product documentation and retrieve historical changes
  • Drive a tool that feeds high-quality retrieved passages into an LLM for response generation

FAQ

What return data does retrieval include?

Retrieval returns the original query, a list of results with id, text, score and metadata, total hit count, elapsed_ms and a from_cache flag.

How do I control precision vs recall?

Adjust top_k and min_score to tune recall and precision. Enabling rerank improves top-result precision at the cost of an extra compute step.