Provides an MCP server that evaluates LLM responses using the Atla evaluation model via a standardized interface.
Configuration
View docs{
"mcpServers": {
"atla_mcp": {
"command": "uvx",
"args": [
"atla-mcp-server"
],
"env": {
"ATLA_API_KEY": "YOUR_API_KEY"
}
}
}
}You have an MCP server that exposes Atla’s evaluation tools through a standardized interface so you can evaluate LLM outputs using the Atla evaluation models. It runs locally via a simple command line workflow and connects to clients that support MCP servers, enabling practical evaluation workflows for model responses and critiques.
To use the Atla MCP server, start the server in a local environment and connect your MCP client to it. The server exposes evaluation tools you can invoke from your client to score and critique an LLM’s response.
Prerequisites: ensure you have a Python-friendly environment and a way to run MCP servers locally. You will manage the runtime with a utility called uv, which helps you run MCP servers from your terminal.
1) Install the runtime tool for MCP servers if you do not have it yet.
2) Obtain your Atla API key from your Atla account.
3) Start the Atla MCP server using the runtime with your API key available in the environment.
ATLA_API_KEY=<your-api-key> uvx atla-mcp-server
# Example when you already have ABIs in place and uvx installed, this runs the server locally.This MCP server is designed to interact with Atla’s evaluation models. Please note that the Atla API is no longer active as of July 21, 2025, so certain live evaluation features may be unavailable or require alternative access arrangements.
The server provides two core tools you can call from your MCP client to evaluate LLM responses: evaluate_llm_response and evaluate_llm_response_on_multiple_criteria.
The server runs as a local stdio MCP server. You start it via the runtime command, and you connect your MCP client to this local server.
Command to start the server:
```
ATLA_API_KEY=<your-api-key> uvx atla-mcp-server
```
Connection is established by your MCP client by opening a stdio channel to the running server.Evaluate an LLM's response to a prompt using a given evaluation criteria. Returns a dictionary with a score and a textual critique.
Evaluate an LLM's response across multiple evaluation criteria. Returns a list of dictionaries with scores and critiques per criterion.