home / mcp / rag mcp server

RAG MCP Server

通过mcp外挂知识库

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "kalicyh-mcp-rag": {
      "url": "http://127.0.0.1:8060/mcp"
    }
  }
}

You have a low-latency Retrieval-Augmented Generation (RAG) service built on the MCP protocol. It provides fast local knowledge retrieval, supports raw or summarized retrieval modes, and integrates with local or remote LLM providers for smart summarization and context handling. This server is designed for modular expansion, asynchronous optimization, and easy management of data sources and queries through a unified MCP interface.

How to use

You work with an MCP client to fetch knowledge, perform queries, and receive results from the MCP-RAG service. Start the service, connect via the MCP endpoint, and use the web-based configuration and document management pages to upload data, customize settings, and run queries.

How to install

Prerequisites you need before installation are clearly defined to ensure the service runs smoothly.

# Prerequisites
- Python >= 3.13
- uv package manager

# Basic installation (cloud API only)
uv sync

# Optional: enable local embeddings (e.g., m3e-small, e5-small)
uv sync --extra local-embeddings

Configuration and usage notes

MCP-RAG uses a JSON file for persistent configuration. The file stores host, port, vector database settings, embedding/provider configurations, LLM provider details, and feature toggles. You can modify this file through the web configuration page after the server starts.

Key defaults include using a local Chroma vector store, Doubao as the embedding provider, and options to enable or disable LLM summaries and caching. The server exposes a management interface for uploading documents and performing queries.

{
  "host": "0.0.0.0",
  "port": 8060,
  "http_port": 8060,
  "debug": false,
  "vector_db_type": "chroma",
  "chroma_persist_directory": "./data/chroma",
  "qdrant_url": "http://localhost:6333",
  "embedding_provider": "zhipu",
  "embedding_device": "cpu",
  "embedding_cache_dir": null,
  "provider_configs": {
    "doubao": {
      "base_url": "https://ark.cn-beijing.volces.com/api/v3",
      "model": "doubao-embedding-text-240715",
      "api_key": null
    },
    "zhipu": {
      "base_url": "https://open.bigmodel.cn/api/paas/v4",
      "model": "embedding-3",
      "api_key": null
    }
  },
  "llm_provider": "doubao",
  "llm_model": "doubao-seed-1.6-250615",
  "llm_base_url": "https://ark.cn-beijing.volces.com/api/v3",
  "llm_api_key": null,
  "enable_llm_summary": false,
  "enable_thinking": true,
  "max_retrieval_results": 5,
  "similarity_threshold": 0.7,
  "enable_reranker": false,
  "enable_cache": false
}

MCP server configuration

Configure the MCP client to connect to the RAG server using the MCP endpoint. The example shows how to reference the RAG server from an MCP client.

{
  "mcpServers": {
    "rag": {
      "url": "http://127.0.0.1:8060/mcp"
    }
  }
}