home / mcp / haiku rag mcp server
Provides document management, search, QA with citations, and research tools via an MCP server for AI assistants.
Configuration
View docs{
"mcpServers": {
"ggozad-haiku.rag": {
"command": "haiku-rag",
"args": [
"serve",
"--mcp",
"--stdio"
]
}
}
}Haiku RAG MCP Server enables AI assistants to interact with document collections through hybrid search, question answering with citations, and multi-agent workflows. It exposes tools for document management, search, QA, and research directly to compatible AI clients, making it easier to index, query, and analyze large sets of documents within your assistant workflows.
You will run the MCP server locally and connect from your AI assistant or client application. The server exposes a set of document-focused capabilities you can invoke via the MCP protocol. Start the server in stdio mode, then configure your client to communicate over the provided command and arguments.
Prerequisites: ensure you have Python 3.12 or newer installed on your system.
Install the full Haiku RAG package to access all features including document processing, all embedding providers, and rerankers.
pip install haiku.ragOptionally, install the slim package if you prefer minimal dependencies.
pip install haiku.rag-slimMCP server configuration is available via an inline example. You can run the server in MCP mode with standard input/output, making it accessible to compatible AI clients.
haiku-rag serve --mcp --stdio
```
---
Configuration example for an MCP client:
```
{
"mcpServers": {
"haiku_rag": {
"command": "haiku-rag",
"args": ["serve", "--mcp", "--stdio"]
}
}
}Common tasks you can perform after starting the MCP server include indexing sources, performing searches, asking questions with citations, conducting deep QA, and running multi-agent research workflows. If you encounter issues, verify Python version, ensure the command and args are correct, and confirm the MCP client is configured to reach the stdio endpoint.
The server provides a set of capabilities for document-oriented workflows, enabling you to index sources, search content, retrieve passages with provenance, answer questions with citations, and run complex multi-agent or analytical tasks.
Perform question answering with citations including page numbers and section headings to trace provenance.
Coordinate multi-agent workflows for planning, searching, evaluating, and synthesizing results.
Run complex analytical tasks via sandboxed Python code execution for aggregation and multi-document analysis.
Provide a multi-turn chat interface with memory for ongoing conversations.
Store full DoclingDocument to enable structure-aware context expansion.
Expose document management, search, QA, and research tools to AI assistants.
Highlight and view text chunks on original page images for visual grounding.
Watch directories and auto-index changes to keep content up to date.
Query the database at historical points using temporal parameters like --before.