Home / MCP / Mobile Next MCP Server
Unified mobile automation server for iOS and Android with accessibility snapshots or coordinate-based actions.
Configuration
View docs{
"mcpServers": {
"mobile_mcp": {
"command": "npx",
"args": [
"-y",
"@mobilenext/mobile-mcp@latest"
]
}
}
}Mobile MCP Server provides a platform-agnostic interface to automate and develop for iOS and Android devices, including simulators, emulators, and real devices. It enables Agents and LLMs to interact with native apps and devices through accessibility data or coordinate-based actions, helping you create scalable mobile automation workflows without needing separate iOS or Android expertise.
After you add the MCP server to your IDE or client, you can instruct your AI assistant to use the available tools to validate UI interactions, read information from screen snapshots, and drive multi-step mobile journeys. You can run scripted flows for data entry, automate user journeys guided by an LLM, and enable agent-to-agent communication for mobile automation tasks.
Prerequisites you need before starting Mobile MCP Server:
Install and run the MCP server using the standard configuration provided in the example below. This config uses npx to fetch the Mobile MCP package and run it directly.
{
"mcpServers": {
"mobile-mcp": {
"command": "npx",
"args": ["-y", "@mobilenext/mobile-mcp@latest"]
}
}
}List all available devices, including simulators, emulators, and real devices, to understand what you can connect to.
Return the screen width and height in pixels for the current device.
Get the current screen orientation (portrait or landscape) of the device.
Change the device orientation to portrait or landscape.
List all installed apps on the connected device.
Launch a specific app using its package or bundle identifier.
Terminate a running app on the device.
Install an app from a file (APK, IPA, app, or zip).
Uninstall an app by package name or bundle ID.
Capture a screenshot of the current device screen.
Save the current screen image to a file for later analysis.
Enumerate visible UI elements with coordinates and properties.
Tap at specific x,y coordinates on the screen.
Perform a double-tap at given coordinates.
Long-press at specific coordinates.
Swipe in a given direction (up, down, left, right) across the screen.
Type text into the focused element, with optional submit action.
Press device hardware buttons (HOME, BACK, VOLUME_UP/DOWN, ENTER, etc.).
Open a URL in the device browser.