home / mcp / playwright mcp server
Provides Playwright-based browser automation via MCP with structured accessibility data for fast, deterministic web interactions.
Configuration
View docs{
"mcpServers": {
"markbustamante77-mcp": {
"command": "npx",
"args": [
"@playwright/mcp@latest"
]
}
}
}Playwright MCP is a Model Context Protocol server that enables browser automation through Playwright’s structured accessibility data. It lets you guide and query web pages using stable, data-driven interactions instead of pixel-based inputs, making it fast, deterministic, and friendly for large language models.
You run the Playwright MCP server locally and connect to it with an MCP client in your development environment. The server exposes a stream of structured page data and a set of actions you can perform, such as clicking elements, typing text, navigating pages, and capturing page snapshots. Use this to script web interactions, automate form filling, and extract data from structured content.
Prerequisites: You need Node.js and a package manager (npm or npx) installed on your machine. You should also have access to a terminal or command prompt.
# Install the Playwright MCP server via npx (runs the MCP directly without a global install)
npx @playwright/mcp@latestThe Playwright MCP server can run in different modes and with specific transport settings. The canonical local setup uses a stdio transport. You can start the server with the following command, which makes it available for an MCP client to connect.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest"
]
}
}
}The Playwright MCP server exposes actions organized as snapshot-based interactions and vision-based interactions. You can perform a variety of tasks such as clicking, hovering, typing, selecting options, taking snapshots, navigating, and managing browser tabs. The primary actions include performing clicks, typing text, taking accessibility snapshots, capturing screenshots, and navigating through pages.
Snapshot Mode uses accessibility snapshots for interaction, while Vision Mode uses screenshots. You can enable Vision Mode by adding a --vision flag when starting the server, though the default is Snapshot Mode for better performance and reliability.
If you need to integrate with a custom transport, you can create a server instance and connect it to your transport layer, enabling you to deliver MCP messages over a custom channel.
Perform a click on a web page element using the element description and the exact element reference from the page snapshot.
Hover the cursor over a targeted element using its description and the precise element reference from the snapshot.
Drag content from a start element to an end element using their descriptions and references.
Type text into an editable element with options to submit or type slowly to trigger page handlers.
Select one or more values in a dropdown by providing human-readable element descriptions and exact references.
Capture an accessibility snapshot of the current page to use for interactions.
Capture a screenshot of the current page for visual references when needed.
List all open browser tabs.
Open a new tab and optionally navigate to a URL.
Select a tab by its index.
Close a specific tab or the current one if no index is provided.
Navigate the page to a specified URL.
Go back to the previous page in history.
Go forward to the next page in history.
Return all console messages from the current page.
Upload one or multiple files to an input element.
Save the current page as a PDF.
Wait for a specified number of seconds (up to 10 seconds).
Close the current page or context.
Install the configured browser if it is not already installed.