Home / MCP / Browser Use MCP Server

Browser Use MCP Server

Provides browser automation for web tasks via MCP, enabling LLMs to browse, fill forms, click, and summarize web pages.

python
Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
    "mcpServers": {
        "browser_use": {
            "command": "browser-use-mcp",
            "args": [
                "--model",
                "gpt-4o"
            ],
            "env": {
                "OPENAI_API_KEY": "your-openai-api-key",
                "DISPLAY": ":0"
            }
        }
    }
}

You can automate browser-based tasks using a fast MCP server that lets language models navigate the web, fill out forms, click elements, and perform web tasks through a simple API. This server is ideal for enabling AI agents to interact with websites in a structured and repeatable way.

How to use

Use this MCP server with your preferred MCP client to enable browser automation powered by your chosen language model. Start the server locally, then point your MCP client at it to issue browser tasks such as visiting pages, searching, filling forms, clicking buttons, and extracting information. You control the model and the automation through the client’s prompts, while the server handles web interactions via Playwright under the hood.

How to install

Prerequisites you need before installation are Python and a package manager. You also need Playwright to drive the browser.

# Install the MCP package for the browser automation server with OpenAI support
pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[openai]"
# Or install the MCP package with all providers
pip install -e "git+https://github.com/yourusername/browser-use-mcp.git#egg=browser-use-mcp[all-providers]"

Install Playwright browsers so the server can launch Chromium for web automation.

playwright install chromium

Additional setup and configuration

Configure your MCP client to include the browser automation server. You specify the server using a stdio command that runs the MCP server locally and passes model choices and API keys as needed.

Example configurations

{
  "mcpServers": {
    "browser_use": {
      "command": "browser-use-mcp",
      "args": ["--model", "gpt-4o"],
      "env": {
        "OPENAI_API_KEY": "your-openai-api-key",
        "DISPLAY": ":0"
      }
    }
  }
}

You can also configure a separate instance for Claude-style desktop environments with a different model selection.

Examples of common tasks

Navigate to a webpage, perform a search, open a result, fill in a form, click buttons, and summarize the page content. You can chain actions by prompting the model to describe the steps and allowing the MCP server to execute them in sequence.

Further usage notes

If you encounter issues, ensure your API key is set in your environment or a .env file, and verify that Playwright can launch chromium. If you see model selection errors, pass a valid model via the --model flag in the server arguments.

Troubleshooting

- API Key Issues: Ensure your API key is correctly set in your environment variables or a .env file. - Provider Not Found: Make sure you’ve installed the required provider package. - Browser Automation Errors: Check that Playwright is installed and Chromium can launch. - Model Selection: If you get errors about an invalid model, specify a valid model with the --model flag. - Debug Mode: Use --debug to enable more detailed logging.

Security and environment considerations

Keep your API keys secure and avoid committing them to version control. Run the MCP server in a trusted environment and limit access to the API endpoints or local runtime where applicable.

Tools and capabilities

- Navigate: Directs the browser to a URL or page through natural language prompts. - Search: Performs web searches and iterates results. - Form fill: Completes forms with provided data. - Click: Interacts with page elements like buttons and links. - Summarize: Extracts and summarizes information from pages.

Available tools

navigate

Directs the browser to a URL or specific page using natural language prompts.

search

Performs web searches and returns relevant results to the agent.

fill_form

Fills out forms on a webpage with provided data.

click

Clicks on page elements like buttons or links to drive interactions.

summarize

Extracts and summarizes information from loaded web pages.