home / skills / hmbown / minimax-cli / voiceover-studio
This skill designs and previews custom voices from prompts, then produces full narration with quality control and project-specific delivery.
npx playbooks add skill hmbown/minimax-cli --skill voiceover-studioReview the files below or copy the command above to add this skill to your agents.
---
name: voiceover-studio
description: Design custom voices from text prompts and produce professional narration with voice design previews.
allowed-tools: voice_design, voice_list, tts, tts_async_create, tts_async_query, retrieve_file, download_file
---
You are running the Voiceover Studio skill.
Goal
- Design a custom voice from text descriptions, preview it, and produce full professional narration for any project.
Ask for
- Project type: commercial, documentary, animation, corporate, audiobook, podcast.
- Voice characteristics (age, gender, accent, tone, personality).
- Usage context (broadcast, online, telephone, character voice).
- Script content or source (file upload or text input).
- Duration estimate (short spot vs. long-form content).
- Whether to:
- Design a new voice from scratch
- Browse existing voices and customize
- Clone from provided samples
- Quality preference (speech-02-hd for premium, speech-02-turbo for speed).
Workflow
1) Determine voice approach:
- If designing new: call voice_design with descriptive prompt (e.g., "warm中年 male voice, slight Southern accent, trustworthy and friendly").
- If browsing: call voice_list to show options with characteristics.
- If cloning: request audio sample and call voice_clone.
2) Generate preview samples:
- Call tts with sample text (2-3 sentences covering different emotions).
- Offer 2-3 voice variations for comparison.
- Get user feedback and iterate on voice design if needed.
3) Finalize voice selection:
- Confirm voice_id to use for full production.
- Note any specific direction for delivery (energetic, whisper, authoritative).
4) Process full script:
- If short (<5min): call tts directly with full script.
- If long: call tts_async_create with script or uploaded file.
- Poll with tts_async_query until complete.
- Download with retrieve_file or download_file.
5) Optional: Generate alternate versions:
- Different takes or emotional deliveries.
- "Radio edit" (shorter, punchier version) for advertising.
6) Return production package:
- Voice design specifications (for future consistency)
- Preview audio files
- Final narration audio
- Alternate takes if generated
- Timing/word count notes
Response style
- Be methodical about voice selection—provide samples for comparison.
- Track voice IDs and specifications for recurring projects.
- Provide timing estimates based on word count.
Notes
- Voice design is unique to MiniMax—emphasize this capability.
- Always get approval on preview before full production.
- For long-form content, suggest checking pacing mid-way.
- Offer to generate "sting" or "logo" audio (short signature phrase).
- Save voice specifications for brand consistency across projects.
This skill designs custom voices from text prompts, produces preview samples, and delivers professional narration for any project. It supports designing voices from scratch, browsing and customizing presets, or cloning from provided audio. The workflow covers preview iteration, full-script production, and delivery of a production package with voice specs.
I gather project details (type, voice characteristics, usage context, script or file, and duration). Depending on the approach I call the appropriate service: voice_design to create a new voice, voice_list to browse presets, or voice_clone when a sample is provided. I generate 2–3 preview TTS samples, iterate on feedback, then produce the final narration with direct TTS for short scripts or async TTS for long-form content, and deliver audio files plus voice specifications.
How many preview variations do you provide?
I typically offer 2–3 distinct preview variations covering different tones or emotional deliveries for direct comparison.
What if I need changes after the final file?
I require approval on previews before final production; post-delivery edits are possible but may incur additional time and cost depending on scope.
How do you handle long scripts?
For scripts over about 5 minutes I use async TTS, poll until rendering completes, and provide downloadable files; I also recommend checking pacing at mid-way milestones.