home / skills / thilinatlm / claude-plugins / droid

droid skill

/droid/skills/droid

This skill automates Android UI testing and device interactions via ADB, returning JSON outputs optimized for LLM consumption.

npx playbooks add skill thilinatlm/claude-plugins --skill droid

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
4.2 KB
---
name: droid
description: This skill should be used when the user asks to "test Android app", "automate Android emulator", "tap button on Android", "take Android screenshot", "interact with Android UI", "ADB automation", "fill Android form", "swipe on emulator", "validate Android UI flow", or needs to control an Android device/emulator via ADB. Provides a unified CLI with JSON output optimized for LLM consumption.
---

# Droid

Unified Android testing tool with **JSON output** for LLM-friendly automation.

## Prerequisites

- Bun runtime (https://bun.sh)
- ADB (Android Debug Bridge) in PATH
- Connected Android device or running emulator
- USB debugging enabled on device

## Quick Start

```bash
# Check device connection
${CLAUDE_PLUGIN_ROOT}/droid-cli/droid info

# Screenshot + UI elements (most useful command)
${CLAUDE_PLUGIN_ROOT}/droid-cli/droid screenshot

# Tap by text (no coordinates needed!)
${CLAUDE_PLUGIN_ROOT}/droid-cli/droid tap -t "Book Now"

# Fill a form field in one command
${CLAUDE_PLUGIN_ROOT}/droid-cli/droid fill "Email" "[email protected]"

# Wait for element to appear
${CLAUDE_PLUGIN_ROOT}/droid-cli/droid wait-for -t "Success" -s 5
```

## Core Commands

### screenshot
Capture screenshot AND UI elements. Returns element coordinates for tapping.

```bash
droid screenshot
droid screenshot --clickable      # Only clickable elements
droid screenshot --no-ui          # Fast, no element dump
```

**Response:** `{"ok":true,"screenshot":"/tmp/screenshot.png","elements":[{"text":"Book","class":"Button","clickable":true,"x":540,"y":350,"bounds":[400,300,680,400]}]}`

### tap
Tap by text or coordinates.

```bash
droid tap -t "Book Now"           # By text
droid tap -t "State" --prefer-input  # Prefer input fields over labels
droid tap -t "Submit" --clickable    # Only clickable elements
droid tap 540 960                 # By coordinates
```

### fill
Fill text field in one command (tap + clear + type + hide-keyboard).

```bash
droid fill "Enter your email" "[email protected]"
```

### wait-for
Wait for element to appear (with timeout).

```bash
droid wait-for -t "Welcome" -s 10
# Returns: {"ok":true,"found":true,"element":{...}} or {"ok":true,"found":false,"timeout":true}
```

## Form Workflow Commands

### clear / type / hide-keyboard

```bash
droid clear                       # Clear focused field
droid type "[email protected]"    # Type into focused field
droid hide-keyboard               # Dismiss keyboard (use instead of 'key back')
```

### key
Send key events.

| Key | Purpose |
|-----|---------|
| `back` | Navigate back |
| `enter` | Submit/confirm |
| `move_home` | Cursor to start of text |
| `move_end` | Cursor to end of text |
| `delete` | Backspace |
| `app_home` | Android home screen |

```bash
droid key back
droid key move_home
```

## Other Commands

| Command | Purpose | Example |
|---------|---------|---------|
| `swipe` | Scroll | `droid swipe up` |
| `longpress` | Long press | `droid longpress -t "Item"` |
| `launch` | Launch app | `droid launch com.example.app` |
| `current` | Current activity | `droid current` |
| `info` | Device info | `droid info` |
| `wait` | Wait ms | `droid wait 1000` |
| `select-all` | Select text | `droid select-all` |

See `references/commands.md` for full documentation.

## Testing Workflow

### Recommended Pattern

```bash
# 1. Screenshot to see current state
droid screenshot

# 2. Read the screenshot image with Claude's Read tool
# 3. Tap by text when possible
droid tap -t "Book Now" -w 1000

# 4. Verify the action worked
droid wait-for -t "Booking Confirmed" -s 5
```

### Form Filling Pattern

```bash
# Use fill command for efficiency
droid fill "Email" "[email protected]"
droid fill "Password" "secret123"
droid tap -t "Sign In" --clickable
droid wait-for -t "Welcome" -s 10
```

### Tips

- **Use `--prefer-input`** when tapping form fields to avoid hitting labels
- **Use `--clickable`** when tapping buttons to ensure element is interactive
- **Use `hide-keyboard`** not `key back` to dismiss keyboard
- **Use `wait-for`** instead of blind `wait` for reliable verification

## Error Handling

All errors return JSON with `"ok":false`:

```bash
droid tap -t "NonexistentButton"
# {"ok":false,"error":"No element found matching 'NonexistentButton'"}
```

Overview

This skill provides a unified CLI to control Android devices and emulators via ADB, returning structured JSON optimized for LLM automation. It simplifies common testing and interaction tasks—screenshots with UI element metadata, tapping by text or coordinates, form filling, swipes, and waiting for UI states. It is designed for repeatable, scriptable flows when driving Android UI from automation or an LLM.

How this skill works

The CLI talks to ADB to capture screenshots, query the view hierarchy, and perform input actions (tap, type, key events, swipe, long press). Commands return JSON objects with success flags, element lists (text, class, clickable, coordinates), and error messages to make results machine-readable. Higher-level commands like fill, screenshot, and wait-for combine low-level steps (tap + clear + type + hide-keyboard) to simplify common workflows.

When to use it

  • Automating interactions on a connected Android device or emulator
  • Testing and validating UI flows end-to-end with repeatable commands
  • Capturing screenshots plus parsed UI element metadata for LLM inspection
  • Tapping UI elements by text rather than raw coordinates
  • Filling forms and sending key events programmatically

Best practices

  • Start with droid screenshot to capture current UI and element coordinates before acting
  • Prefer tapping by text and use --clickable to ensure interactive elements
  • Use --prefer-input when targeting form fields to avoid tapping labels
  • Use droid fill for single-step form inputs (tap+clear+type+hide-keyboard) instead of manual sequences
  • Use droid wait-for to verify state changes instead of blind time waits for reliability
  • Check JSON responses for ok:false and handle the error field programmatically

Example use cases

  • End-to-end smoke test: screenshot -> tap "Sign In" -> wait-for "Welcome"
  • Automated form entry: droid fill "Email" "[email protected]" then droid fill "Password"
  • Visual + semantic validation: droid screenshot to get element bounds and text, then compare to expected UI
  • Tap without coordinates: droid tap -t "Book Now" --clickable to press buttons by label
  • Continuous integration step that launches app, runs a UI flow, and exits with JSON pass/fail

FAQ

What prerequisites are required?

Install Bun runtime, have ADB available in PATH, enable USB debugging, and connect a device or run an emulator.

How do I know an action succeeded?

Every command returns JSON. A successful response contains "ok":true and command-specific fields; failures return "ok":false with an error message.