home / mcp / apache spark mcp server

Apache Spark MCP Server

This read-only MCP Server allows you to connect to Apache Spark data from Claude Desktop through CData JDBC Drivers. For full CRUD support, check out the first managed MCP platform: CData Connect AI (https://www.cdata.com/ai/).

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "cdatasoftware-apache-spark-mcp-server-by-cdata": {
      "url": "https://mcp.example.com/mcp"
    }
  }
}

You run a local, read-only MCP server that exposes live Apache Spark data through the CData JDBC Driver. This lets you ask natural language questions and retrieve up-to-date Spark data without writing SQL, while keeping data access isolated to a simple, secure interface.

How to use

You will connect an MCP client to the local server and start asking questions about your Spark data. The server exposes a small set of tools that let you discover available tables and columns, then run read-only queries. Use natural language to request data, for example asking about correlations, counts, or upcoming events. The client will invoke the built-in tools behind the scenes, so you don’t need to craft SQL manually.

How to install

Prerequisites you need installed before you begin: a Java runtime environment (JRE/JDK) and Maven for building the MCP server.

1. Clone the MCP server repository and navigate into the project folder.

git clone https://github.com/cdatasoftware/apache-spark-mcp-server-by-cdata.git
cd apache-spark-mcp-server-by-cdata

2. Build the MCP server package to produce the runnable JAR.

mvn clean install

3. Obtain and install the CData JDBC Driver for Apache Spark to enable Spark access.

4. License the JDBC Driver to enable driver usage.

# Example commands shown in setup flow
# Locate the driver and license it as part of your installation
# Directory paths will vary by OS

5. Configure the JDBC connection to your Spark data source (use the connection string utility to test and copy the final connection string). This example uses a Salesforce data source for illustration, but you will configure Spark accordingly.

java -jar cdata.jdbc.sparksql.jar

6. Create a .prp file (for example apache-spark.prp) with the required properties to expose the connection via MCP. Include the server name, driver details, and the JDBC URL you copied.

Prefix=sparksql
ServerName=CDataSparkSQL
ServerVersion=1.0
DriverPath=PATH\\TO\\cdata.jdbc.sparksql.jar
DriverClass=cdata.jdbc.sparksql.SparkSQLDriver
JdbcUrl=jdbc:sparksql:InitiateOAuth=GETANDREFRESH;
Tables=

Starting the MCP server locally

Run the MCP server on the same machine as the client. The server operates in stdio mode, so the client must run on the same host.

java -jar /PATH/TO/CDataMCP-jar-with-dependencies.jar /PATH/TO/apache-spark.prp

Available tools

apache_spark_get_tables

Retrieves a list of tables available in the data source. Returns CSV with a header row of column names.

apache_spark_get_columns

Retrieves a list of columns for a specified table. Returns CSV with a header row of column names.

apache_spark_run_query

Executes a SQL SELECT query against the Spark data source and returns results.