home / mcp / browser mcp server
A local MCP server that enables an AI agent to automate browser tasks via WebSocket using a Python backend and a TypeScript Chrome extension.
Configuration
View docs{
"mcpServers": {
"hihuzhen-browser-mcp": {
"command": "uvx",
"args": [
"nep-browser-engine"
]
}
}
}Browser MCP Server enables a local, WebSocket-based MCP (Model Context Protocol) workflow that lets an AI assistant control your browser securely and privately. It provides browser automation capabilities via a Python-based MCP server and a TypeScript Chrome extension, all running locally on your machine.
To use this MCP server with an MCP-compatible client, start the local server and then connect the client to the WebSocket endpoint exposed by the server. The client can send commands to perform browser actions such as navigating pages, interacting with elements, filling forms, taking screenshots, and extracting content. The server translates those requests into browser automation tasks and returns results back to the client.
Key interaction flows you can perform with the MCP client include navigating to URLs, clicking on elements using CSS selectors, filling or selecting form fields, typing keyboard input, extracting page content, and taking screenshots of the full page or specific elements. The system also supports retrieving lists of open windows and tabs to help you manage context while automating tasks.
Connection is established through a WebSocket channel. When the client connects, it can begin issuing MCP tool calls to perform browser actions. You’ll receive structured responses containing results such as element handles, extracted text, or image data from screenshots.
Prerequisites: Python 3.9 or newer, a supported browser (Chrome or Chromium), and a WebSocket-capable environment.
Step 1: Install and build the Chrome extension
cd extension
pnpm install
pnpm run build
# Or download a prebuilt release and load it into ChromeIn Chrome, enable Developer mode and load the unpacked extension after building. Navigate to chrome://extensions/, turn on Developer mode, and click Load unpacked to select the extension package that was built.
Step 2: Run the MCP server locally
{
"mcpServers": {
"nep-browser-engine": {
"type": "stdio",
"command": "uvx",
"args": ["nep-browser-engine"]
}
}
}Step 3: Connect the extension to the server. Open the extension in your browser and follow the prompt to connect to the WebSocket service. The WebSocket default address is ws://localhost:18765, so ensure your server is accessible at this endpoint if you use the default setup.
The server runs locally and communicates with the browser extension over WebSocket. The extension contains a WebSocket client, a tool processor, and scripts injected into web pages to perform actions. You can customize the server’s port and other parameters if needed by adjusting the server configuration.
Quick note on usage: you can perform a wide range of browser automation tasks such as interacting with page elements, navigating, capturing screenshots, and retrieving content. The system supports returning results like element data, page text, and HTML so you can validate actions taken by the AI assistant.
Security: run the server locally and only connect the MCP client to your own environment. Be mindful of automation permissions and ensure the extension is loaded from a trusted source. Keep your system and browser up to date to minimize risks associated with automation tooling.
If the extension cannot connect to the server, verify that the server is running and listening on the expected WebSocket address. Check that the extension is loaded and has permission to access WebSocket resources. Review any error messages from the MCP client for clues about missing tools or failed actions.
If you see issues with specific browser actions, confirm that the target elements can be located using the provided CSS selectors and that the page content is accessible within the automated context. Adjust timeouts as needed to accommodate slower page loads or dynamic content.
Retrieve all open browser windows and tabs to establish context for automation.
Navigate to a URL or refresh the current tab to update the page view.
Close specific tabs or entire windows as needed during automation flows.
Navigate backward or forward in the browser history to replay or correct steps.
Click a page element selected via CSS selector to simulate user interaction.
Fill out input fields or select options within a form on the page.
Query and retrieve elements matching selectors for inspection or further actions.
Simulate keyboard input, including typing into fields and triggering shortcuts.
Extract visible text or HTML content from the current page for validation or data collection.
Capture screenshots of the full page or a targeted element for visual verification.