home / skills / openclaw / skills / windows-ui-automation
This skill automates Windows GUI actions by simulating mouse, keyboard, and window management tasks with PowerShell.
npx playbooks add skill openclaw/skills --skill windows-ui-automationReview the files below or copy the command above to add this skill to your agents.
---
name: windows-ui-automation
description: Automate Windows GUI interactions (mouse, keyboard, windows) using PowerShell. Use when the user needs to simulate user input on the desktop, such as moving the cursor, clicking buttons, typing text in non-web apps, or managing window states.
---
# Windows UI Automation
Control the Windows desktop environment programmatically.
## Core Capabilities
- **Mouse**: Move, click (left/right/double), drag.
- **Keyboard**: Send text, press special keys (Enter, Tab, Alt, etc.).
- **Windows**: Find, focus, minimize/maximize, and screenshot windows.
## Usage Guide
### Mouse Control
Use the provided PowerShell script `mouse_control.ps1.txt`:
```powershell
# Move to X, Y
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action move -X 500 -Y 500
# Click at current position
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action click
# Right click
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action rightclick
```
### Keyboard Control
Use `keyboard_control.ps1.txt`:
```powershell
# Type text
powershell -File skills/windows-ui-automation/keyboard_control.ps1.txt -Text "Hello World"
# Press Enter
powershell -File skills/windows-ui-automation/keyboard_control.ps1.txt -Key "{ENTER}"
```
### Window Management
To focus a window by title:
```powershell
$wshell = New-Object -ComObject WScript.Shell; $wshell.AppActivate("Notepad")
```
## Best Practices
1. **Safety**: Always move the mouse slowly or include delays between actions.
2. **Verification**: Take a screenshot before and after complex UI actions to verify state.
3. **Coordinates**: Remember that coordinates (0,0) are at the top-left of the primary monitor.
This skill automates Windows GUI interactions using PowerShell scripts to simulate mouse, keyboard, and window actions. It is designed to manipulate desktop applications that do not expose automation APIs or when you need true user input simulation. The focus is practical, repeatable control of cursor movements, clicks, keystrokes, and window state management.
The skill provides PowerShell helpers that call native Windows APIs and COM objects to move the mouse, perform clicks (left, right, double), drag, and send keyboard input or special keys. It also includes simple window management via AppActivate and methods to minimize, maximize, focus, or capture screenshots. You invoke the scripts with arguments for action type, coordinates, text, or key codes to drive sequences of UI steps.
Do I need admin rights to run these scripts?
No, typical mouse, keyboard, and AppActivate operations do not require admin rights. Elevated privileges may be needed only if the target app enforces privilege separation.
How do I ensure coordinates work on multi-monitor setups?
Coordinates are relative to the primary monitor with (0,0) at its top-left. For multi-monitor setups calculate offsets or focus the target window and use relative positions to avoid hard-coded primary-only coordinates.