home / mcp / pyautogui mcp server
A MCP (Model Context Protocol) server that provides automated GUI testing and control capabilities through PyAutoGUI.
Configuration
View docs{
"mcpServers": {
"hetaobackend-mcp-pyautogui-server": {
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-pyautogui-server",
"run",
"mcp-pyautogui-server"
]
}
}
}You can control and test graphical interfaces remotely using this MCP server powered by PyAutoGUI. It lets you move the mouse, press keys, take screenshots, locate images on screen, and fetch screen information, all through a consistent MCP client workflow. This enables automated GUI testing, end-to-end UI flows, and reproducible interactions across platforms.
To use the PyAutoGUI MCP server, run it locally and connect to it from an MCP client. You will send commands to move the mouse, click, type text, press hotkeys, take screenshots, and search for images on the screen. The server exposes tools that map directly to GUI actions; your client can orchestrate complex interactions by combining these actions into scripts or test flows.
Prerequisites you need to meet before installing and running the server:
Install the MCP server package using Python's package manager:
pip install mcp-pyautogui-serverIf you plan to run the server through MCP tooling, you can start it as a local stdio server or via MCP command line helpers. The following configurations show two common approaches: one for development with a local directory, and one for published deployments.
# Development / Unpublished Servers Configuration
{
"mcpServers": {
"mcp-pyautogui-server": {
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-pyautogui-server",
"run",
"mcp-pyautogui-server"
]
}
}
}For published deployments, MCP tooling can start the server with a streamlined command. The example below shows how to run the server with the standard MCP launcher.
{
"mcpServers": {
"mcp-pyautogui-server": {
"command": "uvx",
"args": [
"mcp-pyautogui-server"
]
}
}
}Move the mouse to specific coordinates, click at the current or a designated position, drag-and-drop, and query the current mouse position.
Type text, press individual keys, and perform hotkey combinations to simulate user input.
Take screenshots, get screen size, locate image templates on the screen, and read pixel colors.