home / skills / openclaw / skills / windows-ui-automation

windows-ui-automation skill

/skills/wwb-daniel/windows-ui-automation

This skill automates Windows GUI actions by simulating mouse, keyboard, and window management tasks with PowerShell.

npx playbooks add skill openclaw/skills --skill windows-ui-automation

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.7 KB
---
name: windows-ui-automation
description: Automate Windows GUI interactions (mouse, keyboard, windows) using PowerShell. Use when the user needs to simulate user input on the desktop, such as moving the cursor, clicking buttons, typing text in non-web apps, or managing window states.
---

# Windows UI Automation

Control the Windows desktop environment programmatically.

## Core Capabilities

- **Mouse**: Move, click (left/right/double), drag.
- **Keyboard**: Send text, press special keys (Enter, Tab, Alt, etc.).
- **Windows**: Find, focus, minimize/maximize, and screenshot windows.

## Usage Guide

### Mouse Control

Use the provided PowerShell script `mouse_control.ps1.txt`:

```powershell
# Move to X, Y
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action move -X 500 -Y 500

# Click at current position
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action click

# Right click
powershell -File skills/windows-ui-automation/mouse_control.ps1.txt -Action rightclick
```

### Keyboard Control

Use `keyboard_control.ps1.txt`:

```powershell
# Type text
powershell -File skills/windows-ui-automation/keyboard_control.ps1.txt -Text "Hello World"

# Press Enter
powershell -File skills/windows-ui-automation/keyboard_control.ps1.txt -Key "{ENTER}"
```

### Window Management

To focus a window by title:
```powershell
$wshell = New-Object -ComObject WScript.Shell; $wshell.AppActivate("Notepad")
```

## Best Practices

1. **Safety**: Always move the mouse slowly or include delays between actions.
2. **Verification**: Take a screenshot before and after complex UI actions to verify state.
3. **Coordinates**: Remember that coordinates (0,0) are at the top-left of the primary monitor.

Overview

This skill automates Windows GUI interactions using PowerShell scripts to simulate mouse, keyboard, and window actions. It is designed to manipulate desktop applications that do not expose automation APIs or when you need true user input simulation. The focus is practical, repeatable control of cursor movements, clicks, keystrokes, and window state management.

How this skill works

The skill provides PowerShell helpers that call native Windows APIs and COM objects to move the mouse, perform clicks (left, right, double), drag, and send keyboard input or special keys. It also includes simple window management via AppActivate and methods to minimize, maximize, focus, or capture screenshots. You invoke the scripts with arguments for action type, coordinates, text, or key codes to drive sequences of UI steps.

When to use it

  • Automating interactions with legacy or desktop-only applications that lack an API.
  • Creating UI-driven test steps or reproducing user workflows for QA on Windows.
  • Scripting repetitive desktop tasks like form entry, menu navigation, or batch application control.
  • Taking screenshots before/after actions to verify UI state changes.
  • Combining with other automation tools where simulated user input is required.

Best practices

  • Add deliberate delays between actions to avoid race conditions or missed inputs.
  • Use absolute screen coordinates cautiously; prefer relative moves or window-focused actions when possible.
  • Capture a screenshot before and after critical steps to validate outcomes.
  • Run automation in a dedicated environment or VM to avoid interference from user activity.
  • Limit use of high-speed or tight loops for mouse/keyboard to reduce the risk of accidental input.

Example use cases

  • Move the cursor and click a sequence of buttons to drive a setup wizard in a legacy installer.
  • Type text and press Enter in a notepad-like app to populate logs or test input handling.
  • Bring a specific window to foreground, maximize it, and take a screenshot for reporting.
  • Drag files within a file manager UI to test drag-and-drop behavior.
  • Automate repetitive menu clicks and keystrokes for data entry in a desktop line-of-business app.

FAQ

Do I need admin rights to run these scripts?

No, typical mouse, keyboard, and AppActivate operations do not require admin rights. Elevated privileges may be needed only if the target app enforces privilege separation.

How do I ensure coordinates work on multi-monitor setups?

Coordinates are relative to the primary monitor with (0,0) at its top-left. For multi-monitor setups calculate offsets or focus the target window and use relative positions to avoid hard-coded primary-only coordinates.