home / skills / hmbown / minimax-cli / photo-learning

photo-learning skill

safe

/skills/photo-learning

This skill identifies the contents of an image and provides a kid-friendly narration with optional bilingual output.

npx playbooks add skill hmbown/minimax-cli --skill photo-learning

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

775 B

---
name: photo-learning
description: Recognize a photo and narrate a kid-friendly explanation using image understanding + TTS.
allowed-tools: analyze_image, tts
---
You are running the Photo Learning skill.

Goal
- Identify what's in a photo and produce a short, kid-friendly explanation plus narration.

Ask for
- Image path.
- Age range and language(s).
- Preferred tone (gentle, playful, curious).

Workflow
1) Call analyze_image with a prompt that asks for a simple, child-friendly explanation and (optionally) bilingual output.
2) Use the returned text as the narration script.
3) Call tts with output_format "mp3" unless the user requests wav.
4) Return the explanation text and audio path.

Response style
- Keep it short and clear.
- Provide a clean output summary.

Overview

This skill identifies objects, scenes, and actions in a photo and generates a short, kid-friendly explanation plus a narrated audio file. It supports age-tailored language, optional bilingual output, and selectable tones like gentle, playful, or curious. The output includes a clean text summary and a ready-to-play audio file (MP3 by default).

How this skill works

Provide an image path, target age range, language(s), and preferred tone. The skill analyzes the image to produce a simple, concrete script formatted for children, optionally in two languages. That script is fed to a TTS engine to produce an MP3 (or WAV if requested). The final response returns the explanation text and the audio file path.

When to use it

Introduce young children to objects, animals, or scenes in photos.
Create short narrated descriptions for picture-based learning activities.
Produce bilingual captions and audio for language exposure.
Generate child-appropriate narrative for classroom slides or storytime.
Quickly convert family photos into simple, narrated descriptions.

Best practices

Specify the child’s age range (e.g., 2–4, 5–7) to tailor vocabulary and sentence length.
Choose a tone (gentle, playful, curious) to match the learning context or mood.
Request bilingual output only when you can accept slightly shorter or simpler translations.
Provide clear, high-quality images with the subject centered for more accurate descriptions.
Prefer MP3 output for broad compatibility; request WAV only if you need uncompressed audio.

Example use cases

A parent uploads a zoo photo and asks for a playful, bilingual (English/Spanish) narration for a 4-year-old.
A teacher prepares slides: each image receives a gentle three-sentence description and an MP3 for classroom playback.
A language learner uses photos to build vocabulary with short, curious explanations in the target language.
A preschool app converts user photos into narrated prompts for storytelling activities.

FAQ

Can I get the narration in two languages?

Yes. Request bilingual output and list the two languages; the script will include both languages and the TTS will produce one audio file with the combined narration.

What audio format is produced?

MP3 is the default for compatibility. Request WAV if you need uncompressed audio.