home / mcp / spark sql mcp server

Spark SQL MCP Server

Provides read-only access to Spark SQL data via Thrift/HiveServer2 for AI assistants, with schema discovery and multiple authentication options.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "aidancorrell-spark-sql-mcp-server": {
      "command": "uvx",
      "args": [
        "spark-sql-mcp-server"
      ],
      "env": {
        "SPARK_AUTH": "NONE",
        "SPARK_HOST": "your-spark-host.example.com",
        "SPARK_PORT": "10000",
        "SPARK_DATABASE": "default",
        "SPARK_PASSWORD": "YOUR_PASSWORD",
        "SPARK_USERNAME": "YOUR_USERNAME",
        "SPARK_KERBEROS_SERVICE_NAME": "hive"
      }
    }
  }
}

You can run a lightweight MCP server that lets AI assistants query a Spark SQL cluster using the Thrift/HiveServer2 protocol. It supports read-only queries, schema discovery, and multiple authentication methods, making it easy to integrate Spark data into conversational workflows while keeping queries safe and scoped to read operations.

How to use

To use this MCP server, you run it locally or in your environment and connect your MCP client (such as Claude) to it. You can perform read-only SQL operations against your Spark cluster, discover databases and tables, and fetch table schemas. Begin by starting the MCP server, then configure your client to reference the server by its MCP connection details. Typical usage patterns include listing available databases, listing tables in a database, describing a table’s schema, and executing read-only queries that return results in a readable format.

How to install

Prerequisites: You need Python installed on your system. You will either install the MCP server package via Python’s package manager or run it directly with the runtime tool.

# Install the MCP server package
pip install spark-sql-mcp-server

# Or run directly with the runtime tool
uvx spark-sql-mcp-server

Configuration and usage notes

Before starting, set environment variables to point the MCP server at your Spark cluster and to control how you authenticate. Common variables include the host, port, database, and authentication method. You can run the server with these environment settings and then configure your client to connect using the same values.

Security considerations

The server enforces read-only query execution for safety. Only statements such as SELECT, SHOW, DESCRIBE, EXPLAIN, and WITH are allowed. If a query would modify data or alter schema, it will be blocked before reaching the Spark cluster. Passwords and sensitive details are masked in logs and error messages to reduce exposure.

Examples of typical workflow

Steps you would follow in a typical workflow include: setting up the MCP server with the appropriate SPARK_HOST, SPARK_PORT, and SPARK_AUTH values; starting the MCP server; and configuring your AI assistant client to use the server as a data source. Then you can ask questions like which databases exist, what the schema of a specific table is, or to run a read-only query to fetch the top records.

Troubleshooting tips

If you cannot connect, verify that the Spark Thrift Server is accessible from the environment where the MCP server runs, and check that the host, port, and authentication settings match your Spark cluster configuration. If queries fail due to permissions, review your Spark user rights and ensure you are using an authentication method that your cluster accepts.

Notes on EMR and compatibility

The MCP server is compatible with HiveServer2-compatible systems, including Apache Spark, AWS EMR, Hive, Impala, and Presto. When using EMR, ensure that security groups allow access to the Thrift port and consider using an SSH tunnel to protect credentials in transit.

Project-wide environment and runtime details

Environment variables shown in the examples include SPARK_HOST, SPARK_PORT, SPARK_DATABASE, SPARK_AUTH, SPARK_USERNAME, SPARK_PASSWORD, and SPARK_KERBEROS_SERVICE_NAME. The server can be started with a command such as uvx and passing the module name spark-sql-mcp-server, with the authentication and host details provided via environment variables.

Development and testing guidance

If you want to contribute or run tests locally, install the project in editable mode and run the test suite. You can also run a local Docker-based Spark Thrift Server for integration tests. Follow the project’s testing steps to ensure your setup works with Claude or your MCP client.

Available tools

list_databases

List all available databases on the connected Spark cluster.

list_tables

List tables within a specified database.

describe_table

Describe the schema of a specific table, including column names and types.

execute_query

Run read-only SQL queries with results formatted for display.