home / mcp / deepseek-thinking claude mcp server

Deepseek-Thinking Claude MCP Server

🧠 MCP server implementing RAT (Retrieval Augmented Thinking) - combines DeepSeek's reasoning with GPT-4/Claude/Mistral responses, maintaining conversation context between interactions.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "newideas99-deepseek-thinking-claude-3.5-sonnet-cline-mcp": {
      "command": "/path/to/node",
      "args": [
        "/path/to/Deepseek-Thinking-Claude-3.5-Sonnet-CLINE-MCP/build/index.js"
      ],
      "env": {
        "OPENROUTER_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

You run a two-stage MCP server that uses DeepSeek R1 for structured reasoning and Claude 3.5 Sonnet for final responses delivered through OpenRouter. This setup blends focused reasoning with robust response generation, handling multiple conversations and maintaining context to produce accurate, context-aware replies.

How to use

To use this MCP server, connect your MCP client to the local stdio server you run. The server exposes two main actions: generating a response for a given prompt and checking the status of a long-running task. You can request reasoning to be shown, clear conversation memory, and optionally include the existing Cline conversation history in the prompt. The server handles multiple conversations by tracking activity and cleaning up finished sessions.

When you request a response, you receive an immediate task identifier. You can poll for the final answer using that ID. Expect a progression from pending to reasoning to responding and finally complete. The final result includes the generated reply and, if requested, the reasoning produced by the reasoning stage.

How to install

Prerequisites: Node.js and npm installed on your system. You will also need access to the OpenRouter API key for both DeepSeek and Claude models.

1) Install prerequisites on your system if needed, then install the package. Then build the server so it is ready to run.

2) Create an environment file and add your OpenRouter API key. You will also set default model configuration values if you want to override the defaults.

3) Start the local MCP server using the runtime command shown in the configuration. The server will be ready to accept MCP requests.

Configuration and usage notes

This server uses a two-stage flow: DeepSeek R1 handles initial reasoning with a large context, and Claude 3.5 Sonnet produces the final response incorporating that reasoning. OpenRouter coordinates both stages so you get a coherent final answer that respects conversation history.

Environment variables you must provide include your OpenRouter API key, which is used by both DeepSeek and Claude models. You can adjust model selection and tuning parameters as needed, such as temperature, top_p, and repetition_penalty, to balance creativity and consistency.

The server tracks active conversations by monitoring activity and can clear context when you request it. It supports multiple concurrent conversations and filters out ended ones automatically.

If you need to inspect the server’s internal options, you can adjust the DeepSeek model to a focused reasoning stage and keep Claude 3.5 Sonnet as the final response generator. This separation helps you tailor the behavior for different workloads.

Example configuration snippet

{
  "mcpServers": {
    "deepseek_claude": {
      "type": "stdio",
      "name": "deepseek_claude",
      "command": "/path/to/node",
      "args": ["/path/to/Deepseek-Thinking-Claude-3.5-Sonnet-CLINE-MCP/build/index.js"],
      "env": {
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
      }
    }
  },
  "envVars": [
    {
      "name": "OPENROUTER_API_KEY",
      "description": "OpenRouter API key for both DeepSeek and Claude models",
      "required": true,
      "example": "sk_live_..."
    }
  ]
}

Development and testing

For development with auto-rebuild, use the standard build/watch workflow provided in your project setup. Test prompts that exercise both the reasoning and final response stages to ensure the two-model flow behaves as expected across multiple conversations.

Notes and tips

Keep your API key secure and do not expose it in client-side code. Monitor the task status to handle long-running prompts gracefully, and consider enabling optional reasoning visibility only for debugging or advanced usage.

Available tools

generate_response

Main tool to generate a response for a given prompt with options to show reasoning, clear context, or include conversation history.

check_response_status

Tool to poll the status of a long-running response task using the taskId from generate_response.