home / skills / richardanaya / agent-skills / describe-image

describe-image skill

/.opencode/skill/describe-image

This skill analyzes an image using a local tool to describe details based on a prompt, without overloading GPU.

npx playbooks add skill richardanaya/agent-skills --skill describe-image

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
387 B
---
name: describe-image
description: Uses a local model to describe something about an image
license: MIT
compatibility: opencode
metadata:
  audience: tools
---

There is a local CLI tool describe_image that uses a local AI to describe an image. Don't use this tool in parallel or we might overwhelm our GPU.

```
describe_image <disk path> "<prompt to ask about details in image>"
```

Overview

This skill describes images using a local AI model to extract visible details, objects, and context. It runs a command-line tool against image files and returns concise, human-readable descriptions for accessibility, indexing, or analysis. It is optimized for privacy and on-premise use since processing happens locally.

How this skill works

The skill calls a local CLI tool that accepts a disk path to the image and a short prompt asking what to describe. The tool runs on a local GPU-backed model and returns descriptive text about the scene, objects, attributes, and any requested focus details. Avoid running multiple instances in parallel to prevent GPU overload.

When to use it

  • Generate alt text or accessibility descriptions for images
  • Extract scene or object details for indexing and search
  • Quickly summarize photo contents for content moderation or triage
  • Produce image annotations for datasets or QA checks
  • Obtain focused details by asking specific prompts about the image

Best practices

  • Provide a concise, specific prompt about what you want described (e.g., "Describe people and emotions").
  • Use local absolute disk paths to images to ensure the tool can access files reliably.
  • Run requests serially rather than in parallel to avoid overwhelming the GPU.
  • Pre-check image formats and sizes; downscale very large images if needed to speed processing.
  • Validate outputs for sensitive or ambiguous content before automated use.

Example use cases

  • Create accessible alt text for web images by asking for short descriptions.
  • Index photo libraries by extracting objects, locations, and visible tags.
  • Assist content moderators by highlighting potentially problematic content in images.
  • Annotate training datasets with object labels and scene summaries.
  • Perform rapid QA on product photos to confirm presence and orientation of items.

FAQ

How do I invoke the tool?

Run the local CLI with the image disk path and a short prompt, for example: describe_image /path/to/image "What is visible in this photo?".

Can I run multiple requests at once?

No — avoid parallel runs. The model uses the local GPU and concurrent jobs can overwhelm resources and slow or fail processing.