home / skills / lin-a1 / skills-agent / rag_service
This skill enables fast, multi-path semantic search with Milvus, rerank precision, and scalable document storage for large-scale knowledge retrieval.
npx playbooks add skill lin-a1/skills-agent --skill rag_serviceReview the files below or copy the command above to add this skill to your agents.
---
name: rag-service
description: 高性能 RAG 多路检索服务。集成 Milvus 向量数据库进行语义检索,并结合 Rerank 模型进行精准重排序,支持海量文档的高效存储与历史内容召回。
---
## 功能
RAG 多路检索服务,提供:
1. 向量语义检索 - 基于 Milvus 的向量相似度搜索
2. Rerank 重排序 - 对检索结果进行精排
3. 文档存储 - 保存文档到向量数据库
## 调用方式
```python
from services.rag_service.client import RAGServiceClient
client = RAGServiceClient()
# 健康检查
status = client.health()
# 语义检索
result = client.retrieve(
query="Python 异步编程最佳实践",
top_k=5,
min_score=0.85,
rerank=True
)
print(result["results"])
# 便捷方法:只获取文本列表
texts = client.retrieve_texts(query="Python 异步编程", top_k=5)
# 保存文档
client.save(documents=[
{"text": "文档内容...", "metadata": {"title": "标题", "url": "..."}}
])
```
## 返回格式
### retrieve
```json
{
"query": "Python 异步编程",
"results": [
{
"id": "abc123",
"text": "Python异步编程基于asyncio库...",
"score": 0.92,
"metadata": {"title": "Python官方文档", "url": "..."}
}
],
"total": 3,
"elapsed_ms": 45.2,
"from_cache": false
}
```
### save
```json
{
"saved_count": 5,
"collection_name": "websearch_results"
}
```
This skill provides a high-performance RAG (retrieval-augmented generation) multi-route retrieval service. It integrates Milvus for vector semantic search, applies a Rerank model for precise result ordering, and offers scalable document storage and history recall for large collections. The service is designed for production-grade retrieval tasks with low latency and high accuracy.
The service embeds documents into vector space and stores them in Milvus for fast approximate nearest neighbor search. A configurable retrieval step returns top-K candidates which are optionally passed to a Rerank model that refines ordering based on relevance. It also exposes document ingestion and health-check endpoints and returns structured results including scores, metadata, and timing.
What return data does retrieval include?
Retrieval returns the original query, a list of results with id, text, score and metadata, total hit count, elapsed_ms and a from_cache flag.
How do I control precision vs recall?
Adjust top_k and min_score to tune recall and precision. Enabling rerank improves top-result precision at the cost of an extra compute step.