home / skills / bdambrosio / cognitive_workbench / osworld-observe
This skill retrieves current OSWorld observation including screenshot and accessibility tree to empower debugging and UI analysis.
npx playbooks add skill bdambrosio/cognitive_workbench --skill osworld-observeReview the files below or copy the command above to add this skill to your agents.
---
name: osworld-observe
type: python
description: "Get the current observation from the OSWorld environment. Returns screenshot (base64 PNG) and accessibility tree (JSON)."
schema_hint:
value: "ignored"
include_screenshot: "bool (default: true)"
include_a11y: "bool (default: true)"
out: "$variable"
examples:
- '{"type":"osworld-observe","out":"$obs"}'
- '{"type":"osworld-observe","include_screenshot":false,"out":"$a11y_only"}'
---
# OSWorld Observe Tool (Level 4)
## Input
- `include_screenshot`: bool (default: true) - include screenshot in observation
- `include_a11y`: bool (default: true) - include accessibility tree in observation
- `value` parameter is ignored
## Output
- Note ID (bound to `out` variable) containing:
- `text`: formatted observation summary
- `format`: "json"
- `metadata`: observation data including:
- `timestamp`: observation timestamp
- `step_counter`: current step counter
- `observation.screenshot`: dict with `encoding` ("png") and `data_base64` (base64-encoded PNG)
- `observation.accessibility_tree`: raw accessibility tree JSON
## Configuration
- `OSWORLD_URL` environment variable (defaults to `http://localhost:3002`)
- Or pass `osworld_url` in character config's `osworld_config` section
## Common Workflow
```json
{"type":"osworld-observe","out":"$obs"}
{"type":"osworld-execute","python":"pyautogui.click(100,200)","out":"$result"}
{"type":"osworld-observe","out":"$obs2"}
```
## Notes
- Screenshot is returned as base64-encoded PNG data
- Accessibility tree is raw JSON from OSWorld
- No interpretation or filtering is performed - raw observation data only
This skill captures the current observation from an OSWorld environment and returns both a screenshot (base64 PNG) and the raw accessibility tree (JSON). It provides a timestamped, unfiltered snapshot of the UI state for automated agents to inspect, store, or process downstream. Configuration is available via an environment variable or character config.
When invoked, the skill requests an observation from OSWorld and packages the response into a note ID bound to the configured output variable. The observation includes a base64-encoded PNG screenshot and the raw accessibility tree JSON, along with metadata such as timestamp and step counter. Screenshot and accessibility data can be included or omitted using boolean flags.
Can I disable the screenshot or accessibility tree?
Yes. Set include_screenshot or include_a11y to false to omit either item from the returned observation.
How do I point the skill to a non-default OSWorld instance?
Set the OSWORLD_URL environment variable or provide osworld_url in the character config's osworld_config section.