Home / MCP / Vertex AI MCP Server

Vertex AI MCP Server

Provides an MCP server to access Vertex AI Gemini models for coding help and general queries.

typescript
Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
    "mcpServers": {
        "vertex_stdio_mcp": {
            "command": "node",
            "args": [
                "/full/path/to/your/vertex-ai-mcp-server/build/index.js"
            ],
            "env": {
                "AI_PROVIDER": "vertex",
                "GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
                "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
                "VERTEX_MODEL_ID": "gemini-2.5-pro-exp-03-25",
                "GEMINI_MODEL_ID": "gemini-2.5-pro-exp-03-25",
                "GOOGLE_CLOUD_LOCATION": "us-central1",
                "AI_TEMPERATURE": "0.0",
                "AI_USE_STREAMING": "true",
                "AI_MAX_OUTPUT_TOKENS": "65536",
                "AI_MAX_RETRIES": "3",
                "AI_RETRY_DELAY_MS": "1000",
                "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/key.json"
            }
        }
    }
}

You run a MCP server that connects to Vertex AI Gemini models to help you code and answer general queries. It exposes a suite of tools you can invoke from an MCP client to query, generate, analyze, and document code and related topics with streaming responses and built-in retry logic.

How to use

You interact with this MCP server through an MCP client or integration that talks to the server via a predefined set of tools. You can perform natural language queries that leverage Vertex AI Gemini models, enrich results with web search grounding, or rely on the model’s internal knowledge. Use the available tools to ask for code explanations, generate guidelines, analyze code, and retrieve precise snippets from official documentation. Enable streaming if you want faster incremental responses, and adjust retry behavior for transient errors.

How to install

Prerequisites: Node.js v18+, Bun, and a Google Cloud project with Vertex AI API enabled and billing configured.

Step 1. Install dependencies and build the server.

bun install
bun run build

Step 2. Run the server using Node in stdio mode as shown in an MCP configuration. This starts the server directly from the built index.js.

{
  "mcpServers": {
    "vertex_stdio_mcp": {
      "command": "node",
      "args": ["/full/path/to/your/vertex-ai-mcp-server/build/index.js"],
      "env": {
        "AI_PROVIDER": "vertex",
        "GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
        "VERTEX_MODEL_ID": "gemini-2.5-pro-exp-03-25",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "AI_TEMPERATURE": "0.0",
        "AI_USE_STREAMING": "true",
        "AI_MAX_OUTPUT_TOKENS": "65536",
        "AI_MAX_RETRIES": "3",
        "AI_RETRY_DELAY_MS": "1000"
      },
      "disabled": false,
      "timeout": 3600
    }
  }
}

Step 3. Alternatively, you can run via NPX if you want to fetch the package from npm and start the MCP server directly.

{
  "mcpServers": {
    "vertex_npx_mcp": {
      "command": "bunx",
      "args": ["-y", "vertex-ai-mcp-server"],
      "env": {
        "AI_PROVIDER": "vertex",
        "GOOGLE_CLOUD_PROJECT": "YOUR_GCP_PROJECT_ID",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "AI_TEMPERATURE": "0.0",
        "AI_USE_STREAMING": "true",
        "AI_MAX_OUTPUT_TOKENS": "65536",
        "AI_MAX_RETRIES": "3",
        "AI_RETRY_DELAY_MS": "1000"
      },
      "disabled": false,
      "timeout": 3600
    }
  }
}

Additional notes

Environment variables control which provider you use, model IDs, location, and AI behavior. The configuration supports Vertex AI Gemini models with streaming by default and includes basic retry logic for transient API errors. Security considerations include using appropriate authentication in the Google Cloud environment and avoiding overly permissive safety filters in production.

Available tools

answer_query_websearch

Answers a natural language query using the configured Vertex AI model enhanced with Google Search results.

answer_query_direct

Answers a natural language query using only the internal knowledge of the configured Vertex AI model.

explain_topic_with_docs

Provides a detailed explanation for a topic by synthesizing information from official docs found via web search.

get_doc_snippets

Provides precise code snippets or concise answers from official documentation.

generate_project_guidelines

Generates a structured project guidelines document (Markdown) using a specified tech stack and web search for best practices.

code_analysis_with_docs

Analyzes code snippets against official docs to identify bugs, performance issues, and security concerns.

security_best_practices_advisor

Offers security recommendations with code examples for secure implementations.

documentation_generator

Creates technical documentation for code, APIs, or systems following industry best practices.