home / mcp / automac mcp server
Provides experimental macOS UI automation via an MCP server with input control and screen comprehension.
Configuration
View docs{
"mcpServers": {
"digithree-automac-mcp": {
"command": "/path/to/automac-mcp/.venv/bin/python",
"args": [
"/path/to/automac-mcp/automac_mcp.py"
]
}
}
}AutoMac MCP is a Python-based MCP server that lets an AI assistant securely control and automate your macOS UI from a local environment. It exposes a standardized interface for UI actions and screen understanding, enabling hands-free automation of the macOS experience in controlled testing or experimentation.
You connect a compatible MCP client (such as Claude Desktop) to the AutoMac MCP server to start automating macOS UI. The server accepts tool requests like mouse movements, clicks, keyboard input, scrolling, and screen queries, then returns results that your AI can use to decide next steps. Be mindful of permissions and confirm prompts to keep automation safe.
{
"mcpServers": {
"automac_mcp": {
"command": "/path/to/automac-mcp/.venv/bin/python",
"args": ["/path/to/automac-mcp/automac_mcp.py"]
}
}
}Prerequisites and setup steps come from the following practical flow. Install required tooling, add the MCP server to your client configuration, grant macOS accessibility permissions, and then restart your client to begin automation.
This experimental setup requires explicit macOS accessibility permissions and relies on command prompts from your AI client to prevent unintended actions. Use in controlled environments for research and monitor automations closely.
Be explicit about targets, such as which application to focus. After actions in other apps, request switching back to your MCP client to verify results.
Core MCP server features include input control, screen comprehension via accessibility APIs, and OCR-driven text reading. Ongoing work aims at more granular UI detection, advanced interactions, multi-monitor support, and improved visual feedback.
A full example describes opening a Steam wishlist, selecting affordable items, adding to cart, and completing a purchase. This demonstrates end-to-end automation capabilities and how an AI agent can drive UI actions in a real-world workflow.
Return the current screen width and height to help plan coordinates for input actions.
Move the mouse pointer to the specified (x, y) coordinates.
Perform a single left-click at the given coordinates.
Perform a double-click at the given coordinates.
Type a string of text into the currently focused input area.
Scroll the screen by a pixel delta along the x and y axes.
Press the Return/Enter key.
Press the Escape key.
Press the Tab key.
Press the Space key.
Press the Delete/Backspace key.
Press the Forward Delete key.
Press the Up Arrow key.
Press the Down Arrow key.
Press the Left Arrow key.
Press the Right Arrow key.
Select all text (Cmd+A).
Copy selected content (Cmd+C).
Paste from clipboard (Cmd+V).
Cut selected content (Cmd+X).
Undo last action (Cmd+Z).
Redo last undone action (Cmd+Shift+Z).
Save current document (Cmd+S).
Create new document (Cmd+N).
Open document (Cmd+O).
Find in document (Cmd+F).
Close current window (Cmd+W).
Quit current application (Cmd+Q).
Minimize current window (Cmd+M).
Hide current application (Cmd+H).
Switch to next application (Cmd+Tab).
Switch to previous application (Cmd+Shift+Tab).
Open Spotlight search (Cmd+Space).
Open Force Quit dialog (Cmd+Option+Esc).
Refresh/Reload (Cmd+R).
Get window and app layout information via macOS accessibility APIs.
Read on-screen text using OCR with positioning data.
Bring a specific application to the foreground, with optional timeout.
List all currently running applications.
Play a system alert sound to signal the user prompt.