Exposes MaaFramework automation via MCP to automate Android devices and Windows apps with OCR, screen capture, and multi-device coordination.
Configuration
View docs{
"mcpServers": {
"maa-ai-maamcp": {
"command": "uvx",
"args": [
"maa-mcp"
]
}
}
}MaaMCP is an MCP server that exposes MaaFramework’s powerful automation capabilities through a standardized MCP interface, enabling an AI assistant to automate Android devices and Windows desktop apps with multi-device orchestration, OCR-based recognition, and real-time screen capture.
Connect MaaMCP to your MCP client to begin automating tasks across devices and windows. You can discover available Android devices and Windows windows, establish connections, and then issue automation commands such as OCR, clicking, swiping, text input, and keyboard shortcuts. You can also generate and run Pipelines to encapsulate repetitive actions for future reuse. The workflow supports coordinating multiple devices or windows in parallel, making it suitable for cross-device automation and complex scenarios.
Choose one of the installation methods below and run the commands in your terminal.
# Option 1: Install via uvx (recommended)
uvx maa-mcp
# Option 2: Install via Python's pip
pip install maa-mcp
# Option 3: Install from source
# 1) Clone the repository
git clone https://github.com/MistEO/MaaMCP.git
cd MaaMCP
# 2) Install Python dependencies in editable mode
pip install -e .After installing MaaMCP, you will set up MCP servers in your client configuration. Use the standard MCP server entry to point your client to MaaMCP so you can start discovering devices, establishing connections, and running automation tasks.
Scans for available Android devices connected via ADB and returns a list of device identifiers.
Scans for available Windows windows and returns a list of window handles and titles.
Establishes a connection to a selected Android device for subsequent automation tasks.
Connects to a specific Windows window to enable control and OCR-based interaction.
Captures the screen and performs optical character recognition to extract text for decision making.
Captures the screen for later processing by an external model or workflow.
Performs a tap/click at given coordinates, with options for multi-point and long-press actions.
Performs a double-click at specified coordinates.
Executes a swipe gesture to scroll or flip pages on Android devices.
Inputs text into the focused element, with support for long-press if needed.
Simulates a key press or a long press of a key, including Android system keys and Windows virtual keys.
Performs keyboard shortcuts such as Ctrl+C, Ctrl+V, Alt+Tab and other combinations.
Scrolls the mouse wheel on Windows.
Retrieves the documentation for the Pipeline JSON protocol.
Saves a Pipeline JSON to a file, supporting create and update flows.
Loads an existing Pipeline JSON from a file.
Executes a saved Pipeline and returns execution results.
Opens the Pipeline visualization interface in a web browser.