home / skills / hmbown / minimax-cli / music-video-generator
This skill generates cohesive music and video from a unified prompt, synchronizing mood, style, and energy for a polished music video.
npx playbooks add skill hmbown/minimax-cli --skill music-video-generatorReview the files below or copy the command above to add this skill to your agents.
---
name: music-video-generator
description: Create synchronized music and matching video from a unified creative prompt.
allowed-tools: generate_music, generate_video, generate_image
---
You are running the Music Video Generator skill.
Goal
- Produce a cohesive music video where audio and visuals are generated together from a unified creative vision, ensuring synchronization between mood, style, and energy.
Ask for
- Song concept (genre, mood,主题, instruments).
- Visual concept (setting, aesthetic, color palette, era).
- Target length (30s for teaser, 2-3min for full video).
- Any specific subjects, locations, or visual elements to include.
- Whether to generate a poster/cover image.
Workflow
1) Clarify the creative vision:
- Combine music and visual prompts into a unified concept.
- Confirm the emotional arc (buildup, climax, resolution).
2) Generate the music:
- Call generate_music with genre, mood, tempo, and any instrumentation notes.
- Ensure duration matches or slightly exceeds video target.
3) Generate key visual frames:
- Call generate_image for hero frame, key moments, and potential first_frame.
- Capture the visual style guide (colors, lighting, aesthetic).
4) Generate the video:
- Call generate_video with unified prompt incorporating visual concept and mood.
- Use first_frame from generated hero image for visual continuity.
- Match energy cues from music in video motion description.
5) Optional: Generate alternate versions (acoustic, instrumental, remix) if requested.
6) Return:
- Music file path
- Video file path
- Cover/poster image if requested
- Creative notes on visual-audio sync decisions
Response style
- Emphasize the unified creative vision in responses.
- Explain how visual and audio elements complement each other.
- Offer suggestions for alternate versions or iterations.
Notes
- The key is coherence between audio mood and visual style.
- Suggest using the generated frames as editing reference points.
- For longer videos, consider generating shorter clips that can be assembled.
- Offer to match music duration to video for seamless looping if needed.
This skill creates cohesive music videos by generating audio and visuals from a single creative prompt, ensuring mood, style, and energy stay synchronized. It produces the music track, key visual frames, a matched video file, and optional cover art. The result is a ready-to-edit package with creative notes that explain audio-visual sync decisions.
I first clarify a unified creative vision that combines song concept and visual concept, including emotional arc and target length. I generate the music with specified genre, mood, tempo, and instruments, then produce hero and key frames to establish a visual style guide. Finally I render the video using the hero frame as the first frame and align motion and cuts to music energy cues; I can also create alternate versions or shorter clips for assembly.
Can you match music length exactly to the video?
Yes. I can ensure the generated music matches the target video duration or slightly exceeds it for clean editing and seamless looping.
Can I get stems or alternate mixes?
Yes. I can provide alternate versions like instrumental, acoustic, or remixes and note where to swap them for different visual sections.