Prompt Tester MCP server for AI agents

The MCP Prompt Tester is a server that enables agents to evaluate LLM prompts across different providers like OpenAI and Anthropic. It allows side-by-side comparison of models, customization of parameters, and management of multi-turn conversations.

Installation

Install the MCP Prompt Tester using pip or uv:

# Install with pip
pip install -e .

# Or with uv
uv install -e .

API Key Setup

You need to configure API keys for the providers you want to use. There are two setup options:

Environment Variables

Set these environment variables:

OPENAI_API_KEY - Your OpenAI API key
ANTHROPIC_API_KEY - Your Anthropic API key

.env File (Recommended)

Create a .env file in your project or home directory
Add your API keys:

OPENAI_API_KEY=your-openai-api-key-here
ANTHROPIC_API_KEY=your-anthropic-api-key-here

The server will automatically detect and load these keys

A template is provided as .env.example.

Starting the Server

Launch the server using stdio (default) or SSE transport:

# Using stdio transport (default)
prompt-tester

# Using SSE transport on custom port
prompt-tester --transport sse --port 8000

Available Tools

The server provides the following tools for MCP-empowered agents:

list_providers

Retrieves available LLM providers and their default models.

Parameters: None required

Example Response:

{
  "providers": {
    "openai": [
      {
        "type": "gpt-4",
        "name": "gpt-4",
        "input_cost": 0.03,
        "output_cost": 0.06,
        "description": "Most capable GPT-4 model"
      }
    ],
    "anthropic": [
      // ... models ...
    ]
  }
}

test_comparison

Compares multiple prompts side-by-side across different providers, models, and parameters.

Parameters:

comparisons (array): A list of 1-4 comparison configurations, each containing:
- provider (string): The LLM provider ("openai" or "anthropic")
- model (string): The model name
- system_prompt (string): Instructions for the model
- user_prompt (string): The user's message
- temperature (number, optional): Controls randomness
- max_tokens (integer, optional): Maximum tokens to generate
- top_p (number, optional): Controls diversity

Example Usage:

{
  "comparisons": [
    {
      "provider": "openai",
      "model": "gpt-4",
      "system_prompt": "You are a helpful assistant.",
      "user_prompt": "Explain quantum computing in simple terms.",
      "temperature": 0.7
    },
    {
      "provider": "anthropic",
      "model": "claude-3-opus-20240229",
      "system_prompt": "You are a helpful assistant.",
      "user_prompt": "Explain quantum computing in simple terms.",
      "temperature": 0.7
    }
  ]
}

test_multiturn_conversation

Manages stateful conversations with LLM providers.

Modes:

start: Begins a new conversation
continue: Continues an existing conversation
get: Retrieves conversation history
list: Lists all active conversations
close: Closes a conversation

Parameters:

mode (string): Operation mode
conversation_id (string): Unique ID (required for continue, get, close modes)
provider (string): LLM provider (required for start mode)
model (string): Model name (required for start mode)
system_prompt (string): System prompt (required for start mode)
user_prompt (string): User message (for start and continue modes)
temperature (number, optional): Temperature parameter
max_tokens (integer, optional): Maximum tokens to generate
top_p (number, optional): Top-p sampling parameter

Starting a Conversation:

{
  "mode": "start",
  "provider": "openai",
  "model": "gpt-4",
  "system_prompt": "You are a helpful assistant specializing in physics.",
  "user_prompt": "Can you explain what dark matter is?"
}

Continuing a Conversation:

{
  "mode": "continue",
  "conversation_id": "conv_12345",
  "user_prompt": "How does that relate to dark energy?"
}

Client Code Example

Here's how to use the MCP client with the server:

import asyncio
import json
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client

async def main():
    async with stdio_client(
        StdioServerParameters(command="prompt-tester")
    ) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            # List available providers and models
            providers_result = await session.call_tool("list_providers", {})
            print("Available providers and models:", providers_result)
            
            # Run a basic test with a single model
            comparison_result = await session.call_tool("test_comparison", {
                "comparisons": [
                    {
                        "provider": "openai",
                        "model": "gpt-4",
                        "system_prompt": "You are a helpful assistant.",
                        "user_prompt": "Explain quantum computing in simple terms.",
                        "temperature": 0.7,
                        "max_tokens": 500
                    }
                ]
            })
            print("Single model test result:", comparison_result)
            
            # Start a multi-turn conversation
            conversation_start = await session.call_tool("test_multiturn_conversation", {
                "mode": "start",
                "provider": "openai",
                "model": "gpt-4",
                "system_prompt": "You are a helpful assistant specializing in physics.",
                "user_prompt": "Can you explain what dark matter is?"
            })
            
            # Get the conversation ID from the response
            response_data = json.loads(conversation_start.text)
            conversation_id = response_data.get("conversation_id")
            
            # Continue the conversation
            if conversation_id:
                conversation_continue = await session.call_tool("test_multiturn_conversation", {
                    "mode": "continue",
                    "conversation_id": conversation_id,
                    "user_prompt": "How does that relate to dark energy?"
                })

asyncio.run(main())

Optional Configuration

Additional configuration is available through environment variables:

Langfuse Tracing (Optional)

LANGFUSE_SECRET_KEY - Your Langfuse secret key
LANGFUSE_PUBLIC_KEY - Your Langfuse public key
LANGFUSE_HOST - URL of your Langfuse instance

How to install this MCP server

For Claude Code

To add this MCP server to Claude Code, run this command in your terminal:

claude mcp add-json "prompt-tester" '{"command":"prompt-tester","args":[]}'

See the official Claude Code MCP documentation for more details.

For Cursor

There are two ways to add an MCP server to Cursor. The most common way is to add the server globally in the ~/.cursor/mcp.json file so that it is available in all of your projects.

If you only need the server in a single project, you can add it to the project instead by creating or adding it to the .cursor/mcp.json file.

Adding an MCP server to Cursor globally

To add a global MCP server go to Cursor Settings > Tools & Integrations and click "New MCP Server".

When you click that button the ~/.cursor/mcp.json file will be opened and you can add your server like this:

{
    "mcpServers": {
        "prompt-tester": {
            "command": "prompt-tester",
            "args": []
        }
    }
}

Adding an MCP server to a project

To add an MCP server to a project you can create a new .cursor/mcp.json file or add it to the existing one. This will look exactly the same as the global MCP server example above.

How to use the MCP server

Once the server is installed, you might need to head back to Settings > MCP and click the refresh button.

The Cursor agent will then be able to see the available tools the added MCP server has available and will call them when it needs to.

You can also explicitly ask the agent to use the tool by mentioning the tool name and describing what the function does.

For Claude Desktop

To add this MCP server to Claude Desktop:

1. Find your configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

2. Add this to your configuration file:

{
    "mcpServers": {
        "prompt-tester": {
            "command": "prompt-tester",
            "args": []
        }
    }
}

3. Restart Claude Desktop for the changes to take effect