home / mcp / haiku rag mcp server

Haiku RAG MCP Server

Provides document management, search, QA with citations, and research tools via an MCP server for AI assistants.

Installation

Add the following to your MCP client configuration file.

Configuration

{
  "mcpServers": {
    "ggozad-haiku.rag": {
      "command": "haiku-rag",
      "args": [
        "serve",
        "--mcp",
        "--stdio"
      ]
    }
  }
}

Haiku RAG MCP Server enables AI assistants to interact with document collections through hybrid search, question answering with citations, and multi-agent workflows. It exposes tools for document management, search, QA, and research directly to compatible AI clients, making it easier to index, query, and analyze large sets of documents within your assistant workflows.

How to use

You will run the MCP server locally and connect from your AI assistant or client application. The server exposes a set of document-focused capabilities you can invoke via the MCP protocol. Start the server in stdio mode, then configure your client to communicate over the provided command and arguments.

How to install

Prerequisites: ensure you have Python 3.12 or newer installed on your system.

Install the full Haiku RAG package to access all features including document processing, all embedding providers, and rerankers.

pip install haiku.rag

Optionally, install the slim package if you prefer minimal dependencies.

pip install haiku.rag-slim

Additional configuration and usage notes

MCP server configuration is available via an inline example. You can run the server in MCP mode with standard input/output, making it accessible to compatible AI clients.

haiku-rag serve --mcp --stdio
```
---
Configuration example for an MCP client:

```
{
  "mcpServers": {
    "haiku_rag": {
      "command": "haiku-rag",
      "args": ["serve", "--mcp", "--stdio"]
    }
  }
}

Examples and troubleshooting

Common tasks you can perform after starting the MCP server include indexing sources, performing searches, asking questions with citations, conducting deep QA, and running multi-agent research workflows. If you encounter issues, verify Python version, ensure the command and args are correct, and confirm the MCP client is configured to reach the stdio endpoint.

Tools exposed by the MCP server

The server provides a set of capabilities for document-oriented workflows, enabling you to index sources, search content, retrieve passages with provenance, answer questions with citations, and run complex multi-agent or analytical tasks.

Available tools

qa_with_citations

Perform question answering with citations including page numbers and section headings to trace provenance.

multi_agent_research

Coordinate multi-agent workflows for planning, searching, evaluating, and synthesizing results.

rlm_agent

Run complex analytical tasks via sandboxed Python code execution for aggregation and multi-document analysis.

conversational_rag_chat

Provide a multi-turn chat interface with memory for ongoing conversations.

document_structure

Store full DoclingDocument to enable structure-aware context expansion.

mcp_server_tools

Expose document management, search, QA, and research tools to AI assistants.

visual_grounding

Highlight and view text chunks on original page images for visual grounding.

file_monitoring

Watch directories and auto-index changes to keep content up to date.

time_travel

Query the database at historical points using temporal parameters like --before.