Home / MCP / Parquet MCP Server
Provides web search and similarity extraction capabilities to power MCP-driven workflows.
Configuration
View docs{
"mcpServers": {
"parquet_mcp": {
"command": "uv",
"args": [
"--directory",
"/home/${USER}/workspace/parquet_mcp_server/src/parquet_mcp_server",
"run",
"main.py"
],
"env": {
"EMBEDDING_URL": "http://sample-url.com/api/embed",
"OLLAMA_URL": "http://sample-url.com/",
"EMBEDDING_MODEL": "sample-model",
"SEARCHAPI_API_KEY": "your_searchapi_api_key",
"FIRECRAWL_API_KEY": "your_firecrawl_api_key",
"VOYAGE_API_KEY": "your_voyage_api_key",
"AZURE_OPENAI_ENDPOINT": "http://sample-url.com/azure_openai",
"AZURE_OPENAI_API_KEY": "your_azure_openai_api_key"
}
}
}
}You can run Parquet MCP Server to perform web searches and extract relevant information from search results. It is designed to work with a client workflow that calls two main capabilities: searching the web and identifying information similar to prior searches. This server helps you build search-powered applications and content-aware tooling with structured results.
You interact with the Parquet MCP Server through your MCP client. Start the server locally using the standard runtime you have available, then configure your client to connect to the local process. Use the two core tools the server provides: a web search tool that performs searches and scrapes results, and a second tool that merges and extracts relevant information from previous searches. In practice, you will issue a search query for keywords, optionally specify a page number for results, and then run an extraction pass to retrieve and organize the most relevant data from those results.
Prerequisites: ensure you have Python installed and a suitable environment to run Python code. You may also need Node tooling if you plan to install or manage client-side tooling. Follow these concrete steps to install and prepare the server for use.
# Optional: install via a package manager if you have Smithery available
npx -y @smithery/cli install @DeepSpringAI/parquet_mcp_server --client claude
# If you prefer to clone the repository directly
git clone ...
cd parquet_mcp_server
# Create and activate a virtual environment
uv venv
.venv\Scripts\activate # On Windows
source .venv/bin/activate # On macOS/Linux
# Install the package in editable mode
uv pip install -e .
# Prepare environment variables in a .env file
# Example (fill in real values as needed)
EMBEDDING_URL=http://sample-url.com/api/embed
OLLAMA_URL=http://sample-url.com/
EMBEDDING_MODEL=sample-model
SEARCHAPI_API_KEY=your_searchapi_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
VOYAGE_API_KEY=your_voyage_api_key
AZURE_OPENAI_ENDPOINT=http://sample-url.com/azure_openai
AZURE_OPENAI_API_KEY=your_azure_openai_api_keyAdd the MCP server configuration to your Claude Desktop configuration to enable easy access from the client.
{
"mcpServers": {
"parquet-mcp-server": {
"command": "uv",
"args": [
"--directory",
"/home/${USER}/workspace/parquet_mcp_server/src/parquet_mcp_server",
"run",
"main.py"
]
}
}
}Create a .env file with the following environment variables to enable embeddings, search, and Azure OpenAI integration. Replace placeholders with your actual service URLs and keys.
EMBEDDING_URL=http://sample-url.com/api/embed # URL for the embedding service
OLLAMA_URL=http://sample-url.com/ # URL for Ollama server
EMBEDDING_MODEL=sample-model # Model to use for generating embeddings
SEARCHAPI_API_KEY=your_searchapi_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
VOYAGE_API_KEY=your_voyage_api_key
AZURE_OPENAI_ENDPOINT=http://sample-url.com/azure_openai
AZURE_OPENAI_API_KEY=your_azure_openai_api_keyThe server provides two main capabilities. You can perform a web search and scrape results, then extract information from the results to produce a consolidated view.
Tool 1: Search Web — perform a web search across the configured sources and return scraped results. Required parameters: queries (list of search terms). Optional: page_number to select a specific results page.
Tool 2: Extract Info from Search — merge and extract relevant information from previous searches. Required parameters: queries (list of search terms) to guide the extraction.
You can run a comprehensive test suite or individual tests to verify the MCP server behavior. Use the provided test runner to execute all tests or target specific test modules.
python src/tests/run_tests.py
```
# Or run individual tests
```bash
python src/tests/test_search_web.py
python src/tests/test_extract_info_from_search.pyIf you encounter SSL verification errors, verify that your SSL settings are correct in your environment configuration. If embeddings fail to generate, ensure the embedding service is reachable and the selected model exists on your Ollama server. If data conversion steps fail, confirm input files exist, you have write permissions on the output directory, and the Parquet files are not corrupted. If the PostgreSQL side is involved, check the connection settings and ensure the required extensions are installed and accessible.
If you integrate PostgreSQL vector similarity, you can create a function to match search embeddings against stored vectors and return relevant results. The example function matches an embedding against stored web searches, filters by a threshold, sorts by date and similarity, and limits the result count.
-- Create the function for vector similarity search
CREATE OR REPLACE FUNCTION match_web_search(
query_embedding vector(1024), -- Adjusted vector size
match_threshold float,
match_count int -- User-defined limit for number of results
)
RETURNS TABLE (
id bigint,
metadata jsonb,
text TEXT, -- Added text column to the result
date TIMESTAMP, -- Using the date column instead of created_at
similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
web_search.id,
web_search.metadata,
web_search.text, -- Returning the full text of the chunk
web_search.date, -- Returning the date timestamp
1 - (web_search.embedding <=> query_embedding) as similarity
FROM web_search
WHERE 1 - (web_search.embedding <=> query_embedding) > match_threshold
ORDER BY web_search.date DESC, -- Sort by date in descending order (newest first)
web_search.embedding <=> query_embedding -- Sort by similarity
LIMIT match_count; -- Limit the results to the match_count specified by the user
END;
$$;CREATE TABLE web_search (
id SERIAL PRIMARY KEY,
text TEXT,
metadata JSONB,
embedding VECTOR(1024),
-- This will be auto-updated
date TIMESTAMP DEFAULT NOW()
);Configure your Claude Desktop client to point to the local Parquet MCP Server process using the provided runtime command. Ensure the directory path matches where you keep the Parquet MCP Server sources and that the server is started before issuing MCP calls.
SSL errors, embeddings, conversions, or database issues each have targeted checks. Confirm that environment variables are correctly set, dependent services are reachable, and file permissions are correct. If a specific error occurs, verify the related service is running and accessible with the provided URLs or commands.
The server setup and instructions assume a Python-based runtime and common command-line tooling. You may also rely on Node-based tooling where applicable for client-side installation. Language for the server code and configuration is Python with embedded SQL demonstrations for vector queries.
Parquet MCP Server enables web search capabilities and similarity-based extraction within an MCP client workflow. It supports configurable embedding services, search endpoints, and integration points for downstream data processing and database storage.
Perform a web search and scrape results. Requires a list of queries and optional page_number for results.
Merge and extract relevant information from previous searches using a list of queries.