Home / MCP / Vertex AI MCP Server
Provides an MCP server to access Vertex AI Gemini models for coding help and general queries.
Configuration
View docs{
"mcpServers": {
"vertex_stdio_mcp": {
"command": "node",
"args": [
"/full/path/to/your/vertex-ai-mcp-server/build/index.js"
],
"env": {
"AI_PROVIDER": "vertex",
"GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
"VERTEX_MODEL_ID": "gemini-2.5-pro-exp-03-25",
"GEMINI_MODEL_ID": "gemini-2.5-pro-exp-03-25",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"AI_TEMPERATURE": "0.0",
"AI_USE_STREAMING": "true",
"AI_MAX_OUTPUT_TOKENS": "65536",
"AI_MAX_RETRIES": "3",
"AI_RETRY_DELAY_MS": "1000",
"GOOGLE_APPLICATION_CREDENTIALS": "/path/to/key.json"
}
}
}
}You run a MCP server that connects to Vertex AI Gemini models to help you code and answer general queries. It exposes a suite of tools you can invoke from an MCP client to query, generate, analyze, and document code and related topics with streaming responses and built-in retry logic.
You interact with this MCP server through an MCP client or integration that talks to the server via a predefined set of tools. You can perform natural language queries that leverage Vertex AI Gemini models, enrich results with web search grounding, or rely on the model’s internal knowledge. Use the available tools to ask for code explanations, generate guidelines, analyze code, and retrieve precise snippets from official documentation. Enable streaming if you want faster incremental responses, and adjust retry behavior for transient errors.
Prerequisites: Node.js v18+, Bun, and a Google Cloud project with Vertex AI API enabled and billing configured.
Step 1. Install dependencies and build the server.
bun install
bun run buildStep 2. Run the server using Node in stdio mode as shown in an MCP configuration. This starts the server directly from the built index.js.
{
"mcpServers": {
"vertex_stdio_mcp": {
"command": "node",
"args": ["/full/path/to/your/vertex-ai-mcp-server/build/index.js"],
"env": {
"AI_PROVIDER": "vertex",
"GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
"VERTEX_MODEL_ID": "gemini-2.5-pro-exp-03-25",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"AI_TEMPERATURE": "0.0",
"AI_USE_STREAMING": "true",
"AI_MAX_OUTPUT_TOKENS": "65536",
"AI_MAX_RETRIES": "3",
"AI_RETRY_DELAY_MS": "1000"
},
"disabled": false,
"timeout": 3600
}
}
}Step 3. Alternatively, you can run via NPX if you want to fetch the package from npm and start the MCP server directly.
{
"mcpServers": {
"vertex_npx_mcp": {
"command": "bunx",
"args": ["-y", "vertex-ai-mcp-server"],
"env": {
"AI_PROVIDER": "vertex",
"GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
"GOOGLE_CLOUD_LOCATION": "us-central1",
"AI_TEMPERATURE": "0.0",
"AI_USE_STREAMING": "true",
"AI_MAX_OUTPUT_TOKENS": "65536",
"AI_MAX_RETRIES": "3",
"AI_RETRY_DELAY_MS": "1000"
},
"disabled": false,
"timeout": 3600
}
}
}Environment variables control which provider you use, model IDs, location, and AI behavior. The configuration supports Vertex AI Gemini models with streaming by default and includes basic retry logic for transient API errors. Security considerations include using appropriate authentication in the Google Cloud environment and avoiding overly permissive safety filters in production.
Answers a natural language query using the configured Vertex AI model enhanced with Google Search results.
Answers a natural language query using only the internal knowledge of the configured Vertex AI model.
Provides a detailed explanation for a topic by synthesizing information from official docs found via web search.
Provides precise code snippets or concise answers from official documentation.
Generates a structured project guidelines document (Markdown) using a specified tech stack and web search for best practices.
Analyzes code snippets against official docs to identify bugs, performance issues, and security concerns.
Offers security recommendations with code examples for secure implementations.
Creates technical documentation for code, APIs, or systems following industry best practices.