home / mcp / rag mcp server

RAG MCP Server

RAG documentation MCP server

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "jaimeferj-mcp-rag-docs": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ],
      "env": {
        "LLM_MODEL": "gemini-1.5-flash",
        "CHUNK_SIZE": "1000",
        "QDRANT_PATH": "./qdrant_storage",
        "FASTAPI_HOST": "0.0.0.0",
        "FASTAPI_PORT": "8000",
        "CHUNK_OVERLAP": "200",
        "TOP_K_RESULTS": "5",
        "GOOGLE_API_KEY": "YOUR_API_KEY_HERE",
        "EMBEDDING_MODEL": "text-embedding-004",
        "QDRANT_COLLECTION_NAME": "documents"
      }
    }
  }
}

You run a Retrieval-Augmented Generation (RAG) system that integrates with MCP clients to enable seamless querying, routing, and document retrieval. This MCP server exposes a straightforward way to interact via standard REST and OpenAI-compatible endpoints, and it also provides a Model Context Protocol (MCP) interface so you can connect tools like Claude or other MCP clients to perform rich, context-aware information retrieval and answer generation.

How to use

You connect to the MCP server to perform intelligent document queries, route requests to the most suitable retrieval methods, and access OpenAI-compatible chat experiences. Start the MCP server, then configure your MCP client to target the local MCP endpoint. When you issue queries through the MCP channel, you can filter results by tags or section paths, retrieve document structures, and obtain sources cited with full section paths. The server also supports smart routing to combine multiple sources (documents, code, references) and produce cohesive answers.

In practice, you will use the MCP client to send requests for: querying with optional tags and section filters, listing documents, getting document structures, and retrieving system statistics. You can also push new documents with tags to organize content for targeted retrieval. The MCP interface exposes a compact set of commands that map to the core server features, enabling efficient automation and integration with your workflows.

How to install

Prerequisites: you need Python 3.13 or higher and a local runtime for MCP, plus an environment where you can run a local command-line interface to start the MCP server.

Install dependencies in editable mode so you can develop and run locally:

pip install -e .

If you prefer using uv to run Python scripts directly, install and run with uv:

uv pip install -e .

Configure environment variables in an environment file or export them in your shell. You will typically set your Google API key for embeddings and LLM access, along with any chunking and storage options you plan to use.

# Example environment setup
export GOOGLE_API_KEY=your_api_key_here
export CHUNK_SIZE=1000
export CHUNK_OVERLAP=200
export TOP_K_RESULTS=5
export QDRANT_PATH=./qdrant_storage
export QDRANT_COLLECTION_NAME=documents
export FASTAPI_HOST=0.0.0.0
export FASTAPI_PORT=8000
export EMBEDDING_MODEL=text-embedding-004
export LLM_MODEL=gemini-1.5-flash

Additional setup and running the MCP server

After installing dependencies and configuring environment variables, you can run the MCP server locally. The MCP server is exposed as a local stdio-based process that you can integrate with the Claude Desktop client or other MCP-compatible tools.

Start the MCP server via the provided runtime command shown in the usage example:

uv
  "directory" "/path/to/mcp-rag-docs"
  "run" "python" "-m" "mcp_server.server"

Available tools

query_rag

Query the RAG system with a question allowing filtering by tags and section paths.

query_rag_enhanced

Query with automatic reference following to locate and cite sources across multiple documents.

smart_query

Smart query with automatic routing and classification to select the best retrieval strategy.

add_document

Add a document to the RAG system with optional tags for organization.

list_documents

List all stored documents in the system.

delete_document

Delete a document by its ID from the system.

get_rag_stats

Retrieve system statistics and status information.

get_tags

List all available document tags.

get_document_structure

Get the table of contents or section structure for a document.