home / skills / cyangzhou / -2--project-yunshu- / vlm_expert
This skill helps you analyze images, describe visual content, and interact across text and visuals with multi-image support.
npx playbooks add skill cyangzhou/-2--project-yunshu- --skill vlm_expertReview the files below or copy the command above to add this skill to your agents.
---
name: VLM_Expert
description: 实现基于视觉的 AI 对话能力,支持分析图像、描述视觉内容并进行多模态交互。
---
# VLM (Vision Chat) 技能
使 AI 能够理解并响应结合了图像和文本提示的内容。
## 核心功能
- **图像分析**: 识别图片中的物体和场景。
- **多图对比**: 同时分析多张图片。
## CLI 示例
```bash
z-ai vision --prompt "图中有什么?" --image "./photo.jpg"
```This skill enables conversation driven by visual inputs, combining image understanding with natural-language prompts. It analyzes photos and figures out objects, scenes, relationships, and can compare multiple images for differences or similarities. The focus is practical multimodal interaction for tasks like description, analysis, and visual troubleshooting.
The skill inspects one or more images and extracts visual elements such as objects, scenes, spatial relationships, and salient attributes (color, pose, condition). It accepts a text prompt that frames the desired output (e.g., describe, compare, identify issues) and returns structured natural-language responses or comparison summaries. It also supports CLI-style commands to feed images and prompts together for automated workflows.
What image formats are supported?
Common formats like JPEG and PNG are supported; provide standard-resolution images for best results.
How do I get reliable comparisons between images?
Label each image in your prompt, supply consistent lighting and angles, and ask for a focused comparison (e.g., color, damage, layout).
Is sensitive content safe to process?
Avoid submitting personal or sensitive images. Treat visual inputs as you would any external service and remove or anonymize private details.