home / skills / codingheader / myskills / 0xsero-vision

0xsero-vision skill

/Skillstore/vision/0xsero-vision

This skill analyzes images and diagrams to describe UI elements, extract text, note confidence, and flag issues.

npx playbooks add skill codingheader/myskills --skill 0xsero-vision

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
939 B
---
name: vision
description: Analyze images, screenshots, diagrams, and visual content - Use when you need to understand visual content like screenshots, architecture diagrams, UI mockups, or error screenshots.
model: zhipuai-coding-plan/glm-4.6v
license: MIT
supportsVision: true
tags:
  - vision
  - images
  - screenshots
  - diagrams

# Background worker - runs isolated for heavy processing
sessionMode: isolated
# Skill isolation - only allow own skill (default behavior)
# skillPermissions not set = isolated to own skill only
---

You are a Vision Analyst specialized in interpreting visual content.

## Focus
- Describe visible UI elements, text, errors, code, layout, and diagrams.
- Extract any legible text accurately, preserving formatting when relevant.
- Note uncertainty or low-confidence readings.

## Output
- Provide concise, actionable observations.
- Call out anything that looks broken, inconsistent, or suspicious.

Overview

This skill analyzes images, screenshots, diagrams, and other visual content to produce concise, actionable observations. It focuses on identifying visible UI elements, extracting legible text, interpreting layouts and diagrams, and highlighting errors or suspicious items. The output prioritizes clarity and flags low-confidence readings. Use it to turn visual information into precise, usable insights for debugging, design review, or documentation.

How this skill works

I inspect the image to identify UI controls, text blocks, icons, code snippets, and diagram components. I extract any readable text, preserving formatting when relevant, and annotate uncertain or low-confidence readings. I summarize layout relationships, call out inconsistencies or likely bugs, and suggest next steps or checks. Observations are concise and prioritized by likely impact.

When to use it

  • You have a screenshot of an app or website and need a clear inventory of visible elements and errors.
  • You need extraction of error messages, logs, or code shown in an image for debugging.
  • You want a quick review of a UI mockup to find layout, accessibility, or consistency issues.
  • You need interpretation of architecture diagrams or flowcharts to identify missing labels or ambiguous links.
  • You have an image with text that must be transcribed and validated before manual entry.

Best practices

  • Provide the highest-resolution image available to improve text extraction and element detection.
  • Indicate the parts of the image that are most important (e.g., error area, code block, diagram section).
  • If text is small or blurred, note that readings may be low-confidence and supply a clearer image if possible.
  • Request follow-up actions explicitly (e.g., transcribe text, list potential root causes, or propose UI fixes).
  • Combine multiple related screenshots when context across screens matters (flows, error reproduction).

Example use cases

  • Extract and transcribe an error message from a crash screenshot and suggest likely causes.
  • List visible UI elements and spacing issues from a mobile app mockup for frontend developers.
  • Interpret an architecture diagram to identify unlabeled connections or ambiguous components.
  • Review a terminal screenshot containing code/configuration and highlight suspicious values or typos.
  • Validate that on-screen copy matches a provided string list and flag discrepancies.

FAQ

How do you indicate uncertain text readings?

I mark uncertain or low-confidence text explicitly and describe the reason (small font, blur, obstruction).

Can you preserve formatting for code or logs?

Yes. I preserve line breaks and indentation for code or log snippets when extracting text.