Home / MCP / MCP vLLM Benchmarking Tool MCP Server
A proof-of-concept MCP server to interactively benchmark vLLM endpoints and compare results.
Configuration
View docs{
"mcpServers": {
"mcp_vllm": {
"command": "uv",
"args": [
"run",
"/Path/TO/mcp-vllm-benchmarking-tool/server.py"
]
}
}
}You can run an MCP server that interactively benchmarks vLLM endpoints and compare performance across configurations. This makes it easy to run repeatable benchmarks, capture results, and analyze how different models or endpoints perform under your workload.
Install and run the MCP benchmarking server, then use an MCP client to initiate benchmarks. You can prompt the tool to run a vLLM benchmark against a specific endpoint, specify the model, and run multiple iterations to gather stable results. The server handles running the benchmark and presenting a comparison of results, while automatically treating the first iteration as a warmup and excluding it from the final comparison.
Prerequisites: you need a runtime capable of executing MCP stdio servers (such as uv) and access to a Python script that implements the benchmarking logic.
1. Clone the benchmarking tool repository.
2. Add the MCP server configuration to your MCP setup using the following snippet.
{
"mcpServers": {
"mcp_vllm": {
"command": "uv",
"args": [
"run",
"/Path/TO/mcp-vllm-benchmarking-tool/server.py"
]
}
}
}Be aware that some outputs from vLLM can vary and may occasionally produce JSON that looks invalid. If this happens, re-run the benchmark to confirm consistency and track any fluctuations. The flow described here is intended to provide an interactive, repeatable benchmarking mechanism within MCP.
Runs a vLLM endpoint benchmark, collecting timing and accuracy-like metrics across multiple iterations to produce a comparative report.
Exclude the first iteration from the final results to remove warmup bias when comparing benchmarks.