home / skills / hmbown / minimax-cli / music-video-generator

music-video-generator skill

/skills/music-video-generator

This skill generates cohesive music and video from a unified prompt, synchronizing mood, style, and energy for a polished music video.

npx playbooks add skill hmbown/minimax-cli --skill music-video-generator

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.1 KB
---
name: music-video-generator
description: Create synchronized music and matching video from a unified creative prompt.
allowed-tools: generate_music, generate_video, generate_image
---
You are running the Music Video Generator skill.

Goal
- Produce a cohesive music video where audio and visuals are generated together from a unified creative vision, ensuring synchronization between mood, style, and energy.

Ask for
- Song concept (genre, mood,主题, instruments).
- Visual concept (setting, aesthetic, color palette, era).
- Target length (30s for teaser, 2-3min for full video).
- Any specific subjects, locations, or visual elements to include.
- Whether to generate a poster/cover image.

Workflow
1) Clarify the creative vision:
   - Combine music and visual prompts into a unified concept.
   - Confirm the emotional arc (buildup, climax, resolution).
2) Generate the music:
   - Call generate_music with genre, mood, tempo, and any instrumentation notes.
   - Ensure duration matches or slightly exceeds video target.
3) Generate key visual frames:
   - Call generate_image for hero frame, key moments, and potential first_frame.
   - Capture the visual style guide (colors, lighting, aesthetic).
4) Generate the video:
   - Call generate_video with unified prompt incorporating visual concept and mood.
   - Use first_frame from generated hero image for visual continuity.
   - Match energy cues from music in video motion description.
5) Optional: Generate alternate versions (acoustic, instrumental, remix) if requested.
6) Return:
   - Music file path
   - Video file path
   - Cover/poster image if requested
   - Creative notes on visual-audio sync decisions

Response style
- Emphasize the unified creative vision in responses.
- Explain how visual and audio elements complement each other.
- Offer suggestions for alternate versions or iterations.

Notes
- The key is coherence between audio mood and visual style.
- Suggest using the generated frames as editing reference points.
- For longer videos, consider generating shorter clips that can be assembled.
- Offer to match music duration to video for seamless looping if needed.

Overview

This skill creates cohesive music videos by generating audio and visuals from a single creative prompt, ensuring mood, style, and energy stay synchronized. It produces the music track, key visual frames, a matched video file, and optional cover art. The result is a ready-to-edit package with creative notes that explain audio-visual sync decisions.

How this skill works

I first clarify a unified creative vision that combines song concept and visual concept, including emotional arc and target length. I generate the music with specified genre, mood, tempo, and instruments, then produce hero and key frames to establish a visual style guide. Finally I render the video using the hero frame as the first frame and align motion and cuts to music energy cues; I can also create alternate versions or shorter clips for assembly.

When to use it

  • You need a quick proof-of-concept music video from a single brief.
  • Creating teasers (30s) or full songs (2–3 min) with matched visuals.
  • Producing social clips where audio and motion must feel unified.
  • Generating reference frames and notes for an editor to finish a project.
  • Creating loopable background videos with seamless audio-video sync.

Best practices

  • Provide a clear unified prompt: genre, mood, instruments, setting, color palette, and emotional arc.
  • Specify target length and any must-have visual elements or subjects up front.
  • Request a hero frame and 3–5 key frames to guide continuity and editing.
  • Ask for alternate versions (instrumental, acoustic, remix) if you plan edits or social cuts.
  • For long videos, generate shorter synchronized clips to assemble for better control over pacing.

Example use cases

  • A 30s teaser for an indie synth-pop single with neon city visuals and a dusk color palette.
  • A 2.5min cinematic track with an emotional arc: buildup (sparse strings), climax (full band), resolution (solo piano) and matching landscape visuals.
  • Loopable ambient background for a cafe scene, 60s loop matched to slow-motion visuals for seamless playback.
  • Alternate acoustic version with stripped instrumentation and a subdued, intimate visual treatment for behind-the-scenes content.

FAQ

Can you match music length exactly to the video?

Yes. I can ensure the generated music matches the target video duration or slightly exceeds it for clean editing and seamless looping.

Can I get stems or alternate mixes?

Yes. I can provide alternate versions like instrumental, acoustic, or remixes and note where to swap them for different visual sections.