home / mcp / mcp local rag mcp server

MCP Local RAG MCP Server

Local-first RAG server for developers using MCP. Semantic + keyword search for code and technical docs. Fully private, zero setup.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

{
  "mcpServers": {
    "shinpr-mcp-local-rag": {
      "url": "https://mcp.example.com/mcp",
      "headers": {
        "BASE_DIR": "/path/to/your/documents"
      }
    }
  }
}

You can run a fully private, local retrieval-augmented generation (RAG) server that indexes your documents and answers questions without sending data to external services. This MCP Local RAG operates entirely on your machine after an initial model download, giving you fast, code-focused semantic search with exact-term keyword boosts.

How to use

You will integrate a local MCP server into your preferred AI coding tool. The server ingests documents (PDF, DOCX, TXT, Markdown, or HTML content via data ingestion), creates local embeddings, and serves search results that combine semantic similarity with keyword boosting. Use it to ingest your technical specs, API docs, or research papers and then query for precise terms like useEffect, ERR_CONNECTION_REFUSED, or specific class names. You can ingest individual documents or HTML content you fetch for indexing, then run searches to retrieve the most relevant chunks with reliable context.

How to install

Prerequisites: Node.js installed on your machine. You will need npm or npx to run the local MCP server. After preparing your environment, you start the server using the included MCP command.

1) Ensure Node.js is installed. You can verify with node -v and npm -v.

2) Start the MCP Local RAG using the provided MCP command. The server runs as a local process and uses a BASE_DIR to locate and index your documents.

3) Point your MCP client to the local server configuration so your AI assistant can ingest, search, and retrieve your documents.

Configuration and usage notes

Environment variable you must provide to specify where your documents live and where the index is stored.

The server begins by downloading the embedding model on first run and then operates offline. You can adjust search behavior using tuning options described in the tuning section.

Security and privacy

All processing happens locally after the initial model download. No data leaves your machine during normal operation.

The document root is restricted to the BASE_DIR you specify, preventing access to arbitrary filesystem paths.

What you can do with the server

Ingest documents to index content and then run queries that return relevant chunks with their source, document title, and a relevance score.

First run and ongoing usage

The embedding model downloads on the first run and typically completes in a couple of minutes. After that, you can ingest more content and perform searches entirely offline.

Troubleshooting

If you see no results, confirm you have ingested documents into BASE_DIR. If the model download fails, verify your internet connection or try again later.

If you encounter slow queries, check the number of chunks or document size and consider splitting large files into smaller parts before ingesting.

FAQ

Is this really private? Yes. After the model download, nothing leaves your machine.

Can I use this offline? Yes, after the first model download.

Development

For contributors, the project includes a modular structure with components for parsing, chunking, embedding, vector storage, and MCP tool integration.

Available tools

ingest_file

Ingest a document from the filesystem to be indexed and searched later. Supports PDF, DOCX, TXT, and Markdown.

ingest_data

Ingest HTML content retrieved by your assistant or via web fetch to index web-based documentation and HTML content.

query_documents

Search the indexed content using semantic similarity with optional keyword boosts to prioritize exact terms.

list_files

List all files in BASE_DIR and their ingested status to verify what has been indexed.

delete_file

Remove a previously ingested file from the local index.

status

Show the current status of the RAG server and its indexing state.