home / mcp / screen text mcp server
Provides screen capture and OCR capabilities to capture screens, capture app windows, and extract text via OCR.
Configuration
View docs{
"mcpServers": {
"ikhide-screen-capture-mcp": {
"command": "node",
"args": [
"path/to/mcp-screen-text/dist/index.js"
]
}
}
}MCP Screen Text provides screen capture and optical character recognition (OCR) capabilities, enabling you to take screenshots of your desktop or specific applications and extract text from images. This makes it easy to automate text capture from your workspace and integrate it with other MCP clients.
You can access the Screen Text server through any MCP client that supports stdio transport. Use the following core capabilities to capture and read text from your screen or from application windows: capture_screen to take a full-screen or display-specific capture, capture_application_screen to grab a single application's window, extract_text to run OCR on an existing image, and capture_screen_and_extract_text to perform both actions in one step. If you want to automate text extraction directly after capturing, prefer capture_screen_and_extract_text and specify the desired language for OCR (for example, eng for English or spa for Spanish). And if you need to see which windows or applications are available for capture, you can use list_applications.
Prerequisites: ensure you have Node.js and npm installed on your machine.
npm install
```} ,{Build and run in development or start the built server when ready.
npm run build
```
```
npm run dev
```
```
npm startConfigure your MCP client to connect to the Screen Text server using stdio transport. You can provide the following example configuration to your client, which runs the server executable via Node and points to the built distribution.
{
"mcpServers": {
"screen-text": {
"command": "node",
"args": ["path/to/mcp-screen-text/dist/index.js"]
}
}
}Captures a screenshot of the entire screen or a specific display. Optional display index and image format (png or jpg) control the output.
Captures a screenshot of a specific application window by name. You can specify the target application and image format.
Lists all running applications that can be captured, enabling you to choose targets for application-specific screenshots.
Runs OCR on a provided image file to extract readable text, supporting multiple languages like eng, spa, or fra.
Captures a screenshot (full screen or a specific application window) and performs OCR in one operation, returning both the image and the extracted text.