home / mcp / web browser mcp server
An advanced web browsing server for the Model Context Protocol (MCP) powered by Playwright, enabling headless browser interactions through a flexible, secure API.
Configuration
View docs{
"mcpServers": {
"random-robbie-mcp-web-browser": {
"command": "python",
"args": [
"/path/to/your/server.py"
]
}
}
}You run an advanced MCP server that lets you control a headless web browser via a secure, flexible API. This enables automated web interactions, full-page extraction, multi-tab browsing, and scripted actions to build powerful browser-based automation and data collection workflows within your MCP client apps.
You connect to the web browser MCP server from your MCP client and issue actions to navigate, interact with pages, extract content, and gather page information. You can perform common web automation tasks such as navigating to a URL, filling forms, clicking elements, taking screenshots, and retrieving links. The server supports multiple tabs, waits for navigation, executes JavaScript on pages, and returns detailed page information for your downstream logic.
Typical usage patterns include opening a page, performing form input, and extracting content or screenshots. You can also collect all links on a page or filter them by a pattern, then switch between tabs or close them as your workflow requires.
Prerequisites you need before starting:
Python 3.10+
MCP SDK
Playwright
Install the MCP package and Playwright dependencies, then install browser binaries.
pip install mcp playwright
playwright installConfigure the MCP server in the Claude Desktop configuration file to point to your local server process.
{
"mcpServers": {
"web_browser": {
"command": "python",
"args": [
"/path/to/your/server.py"
]
}
}
}Navigate to a URL and establish a page context for subsequent actions like extraction, interaction, and scripting.
Extract visible text from the current page or a specific element using an optional CSS selector.
Click a page element specified by a CSS selector to trigger events, forms, or navigation.
Capture screenshots of the full page or a specific element for visual verification or reporting.
Retrieve all links from the current page, with an optional pattern filter to narrow results.
Type text into a form field identified by a CSS selector.
Open a new browser tab, optionally navigating to a specified URL.
Switch the active tab to another one by its ID.
List all currently open tabs and their identifiers.
Close a specific tab or the current tab if no ID is provided.
Refresh the current page to re-fetch content.
Obtain detailed metadata about the current page, including URL, title, and loading state.
Scroll the page in a specified direction by a given amount, such as page, half, or pixel values.
Wait for in-page navigation to complete within a timeout period.
Run custom JavaScript in the context of the current page and return the result.