home / mcp / mcp local rag mcp server
Local-first RAG server for developers using MCP. Semantic + keyword search for code and technical docs. Fully private, zero setup.
Configuration
View docs{
"mcpServers": {
"shinpr-mcp-local-rag": {
"url": "https://mcp.example.com/mcp",
"headers": {
"BASE_DIR": "/path/to/your/documents"
}
}
}
}You can run a fully private, local retrieval-augmented generation (RAG) server that indexes your documents and answers questions without sending data to external services. This MCP Local RAG operates entirely on your machine after an initial model download, giving you fast, code-focused semantic search with exact-term keyword boosts.
You will integrate a local MCP server into your preferred AI coding tool. The server ingests documents (PDF, DOCX, TXT, Markdown, or HTML content via data ingestion), creates local embeddings, and serves search results that combine semantic similarity with keyword boosting. Use it to ingest your technical specs, API docs, or research papers and then query for precise terms like useEffect, ERR_CONNECTION_REFUSED, or specific class names. You can ingest individual documents or HTML content you fetch for indexing, then run searches to retrieve the most relevant chunks with reliable context.
Prerequisites: Node.js installed on your machine. You will need npm or npx to run the local MCP server. After preparing your environment, you start the server using the included MCP command.
1) Ensure Node.js is installed. You can verify with node -v and npm -v.
2) Start the MCP Local RAG using the provided MCP command. The server runs as a local process and uses a BASE_DIR to locate and index your documents.
3) Point your MCP client to the local server configuration so your AI assistant can ingest, search, and retrieve your documents.
Environment variable you must provide to specify where your documents live and where the index is stored.
The server begins by downloading the embedding model on first run and then operates offline. You can adjust search behavior using tuning options described in the tuning section.
All processing happens locally after the initial model download. No data leaves your machine during normal operation.
The document root is restricted to the BASE_DIR you specify, preventing access to arbitrary filesystem paths.
Ingest documents to index content and then run queries that return relevant chunks with their source, document title, and a relevance score.
The embedding model downloads on the first run and typically completes in a couple of minutes. After that, you can ingest more content and perform searches entirely offline.
If you see no results, confirm you have ingested documents into BASE_DIR. If the model download fails, verify your internet connection or try again later.
If you encounter slow queries, check the number of chunks or document size and consider splitting large files into smaller parts before ingesting.
Is this really private? Yes. After the model download, nothing leaves your machine.
Can I use this offline? Yes, after the first model download.
For contributors, the project includes a modular structure with components for parsing, chunking, embedding, vector storage, and MCP tool integration.
Ingest a document from the filesystem to be indexed and searched later. Supports PDF, DOCX, TXT, and Markdown.
Ingest HTML content retrieved by your assistant or via web fetch to index web-based documentation and HTML content.
Search the indexed content using semantic similarity with optional keyword boosts to prioritize exact terms.
List all files in BASE_DIR and their ingested status to verify what has been indexed.
Remove a previously ingested file from the local index.
Show the current status of the RAG server and its indexing state.