home / mcp / lakeflow mcp server

Lakeflow MCP Server

Launch Lakeflow compute jobs by spawning scalable runs via an MCP server with per-run arguments and secure environment handling.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "arahimi-hims-lakeflow-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--quiet",
        "--directory",
        "/path/to/lakeflow-mcp",
        "python",
        "lakeflow.py"
      ],
      "env": {
        "DATABRICKS_HOST": "https://hims-machine-learning-staging-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "<your token>"
      }
    }
  }
}

You can run massively parallel compute jobs on the Lakeflow platform through an MCP server. This setup lets you spawn scalable data processing tasks, control how many workers run, and pass per-run arguments, all while keeping secrets secure and organized. The MCP server configuration shown here enables you to launch Lakeflow jobs from an agent you control and manage their lifecycle programmatically.

How to use

You use the MCP server to start, monitor, and retrieve logs for Lakeflow compute jobs. Start by configuring a local MCP client to communicate with your Lakeflow setup, then issue commands to create a job from source, trigger multiple runs with different arguments, and monitor or fetch logs for each run.

How to install

Prerequisites you need before configuring the MCP server: - Access to Databricks for Lakeflow execution - A working environment where you can store MCP configuration at the expected path - A client that can communicate with the MCP server (the MCP client is driven by the provided commands)

# Step 1: Ensure you have access to Databricks
# Step 2: Prepare MCP configuration for Lakeflow in your MCP client
# Step 3: Place the MCP config at the expected path (see below)

Additional notes

The MCP server uses a JSON configuration to define how to launch Lakeflow as a local process. It specifies the runtime command, arguments, and the required environment variables needed to connect to Databricks. You will manage jobs by creating a job from source, triggering runs with different argument sets, and monitoring the run state and logs.

MCP server configuration for Lakeflow

{
  "mcpServers": {
    "lakeflow": {
      "command": "uv",
      "args": [
        "run",
        "--quiet",
        "--directory",
        "/path/to/lakeflow-mcp",
        "python",
        "lakeflow.py"
      ],
      "env": {
        "DATABRICKS_HOST": "https://hims-machine-learning-staging-workspace.cloud.databricks.com",
        "DATABRICKS_TOKEN": "<your token>"
      }
    }
  } 
}

Usage example for the MCP agent

With the MCP server configured, you can instruct the agent to launch and manage Lakeflow runs. For example, you can request multiple copies of the same Lakeflow job to run concurrently, each with distinct arguments, and then gather their logs and results as needed.

Available tools

create-job-from-source

Builds, uploads, and prepares a Lakeflow job from your local source, returning a job ID for subsequent actions.

trigger-run

Starts one or more parallel runs of the prepared Lakeflow job with specified arguments.

list-job-runs

Lists all runs associated with a given Lakeflow job ID, showing status and metadata.

get-run-logs

Retrieves logs for a specific run, enabling debugging and monitoring.