home / skills / hmbown / minimax-cli / voiceover-studio

voiceover-studio skill

safe

This skill designs and previews custom voices from prompts, then produces full narration with quality control and project-specific delivery.

npx playbooks add skill hmbown/minimax-cli --skill voiceover-studio

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.7 KB

---
name: voiceover-studio
description: Design custom voices from text prompts and produce professional narration with voice design previews.
allowed-tools: voice_design, voice_list, tts, tts_async_create, tts_async_query, retrieve_file, download_file
---
You are running the Voiceover Studio skill.

Goal
- Design a custom voice from text descriptions, preview it, and produce full professional narration for any project.

Ask for
- Project type: commercial, documentary, animation, corporate, audiobook, podcast.
- Voice characteristics (age, gender, accent, tone, personality).
- Usage context (broadcast, online, telephone, character voice).
- Script content or source (file upload or text input).
- Duration estimate (short spot vs. long-form content).
- Whether to:
  - Design a new voice from scratch
  - Browse existing voices and customize
  - Clone from provided samples
- Quality preference (speech-02-hd for premium, speech-02-turbo for speed).

Workflow
1) Determine voice approach:
   - If designing new: call voice_design with descriptive prompt (e.g., "warm中年 male voice, slight Southern accent, trustworthy and friendly").
   - If browsing: call voice_list to show options with characteristics.
   - If cloning: request audio sample and call voice_clone.
2) Generate preview samples:
   - Call tts with sample text (2-3 sentences covering different emotions).
   - Offer 2-3 voice variations for comparison.
   - Get user feedback and iterate on voice design if needed.
3) Finalize voice selection:
   - Confirm voice_id to use for full production.
   - Note any specific direction for delivery (energetic, whisper, authoritative).
4) Process full script:
   - If short (<5min): call tts directly with full script.
   - If long: call tts_async_create with script or uploaded file.
   - Poll with tts_async_query until complete.
   - Download with retrieve_file or download_file.
5) Optional: Generate alternate versions:
   - Different takes or emotional deliveries.
   - "Radio edit" (shorter, punchier version) for advertising.
6) Return production package:
   - Voice design specifications (for future consistency)
   - Preview audio files
   - Final narration audio
   - Alternate takes if generated
   - Timing/word count notes

Response style
- Be methodical about voice selection—provide samples for comparison.
- Track voice IDs and specifications for recurring projects.
- Provide timing estimates based on word count.

Notes
- Voice design is unique to MiniMax—emphasize this capability.
- Always get approval on preview before full production.
- For long-form content, suggest checking pacing mid-way.
- Offer to generate "sting" or "logo" audio (short signature phrase).
- Save voice specifications for brand consistency across projects.

Overview

This skill designs custom voices from text prompts, produces preview samples, and delivers professional narration for any project. It supports designing voices from scratch, browsing and customizing presets, or cloning from provided audio. The workflow covers preview iteration, full-script production, and delivery of a production package with voice specs.

How this skill works

I gather project details (type, voice characteristics, usage context, script or file, and duration). Depending on the approach I call the appropriate service: voice_design to create a new voice, voice_list to browse presets, or voice_clone when a sample is provided. I generate 2–3 preview TTS samples, iterate on feedback, then produce the final narration with direct TTS for short scripts or async TTS for long-form content, and deliver audio files plus voice specifications.

When to use it

Creating a unique brand voice for commercials or podcasts
Revoicing long-form content like audiobooks or documentaries
Quick turnaround spots where speed vs. quality trade-offs are needed
Cloning an existing voice from client-provided samples
Producing alternate takes or radio edits for advertising

Best practices

Collect clear voice descriptors: age, gender, accent, tone, and personality
Provide sample lines or a short script for previewing multiple emotions
Approve preview samples before committing to full production
Use high-quality audio samples if cloning to ensure fidelity
Choose speech-02-hd for premium quality and speech-02-turbo when speed matters

Example use cases

Commercial: design a warm, trustworthy host voice and produce 30–60s radio/online spots
Documentary: create an authoritative narrator with mid-paced delivery and produce full-length episodes
Animation: design distinct character voices and deliver multiple takes for direction
Corporate: produce polished training or explainer narration with brand-consistent voice specs
Audiobook/Podcast: clone or design a narrator and generate chaptered files with timing notes

FAQ

How many preview variations do you provide?

I typically offer 2–3 distinct preview variations covering different tones or emotional deliveries for direct comparison.

What if I need changes after the final file?

I require approval on previews before final production; post-delivery edits are possible but may incur additional time and cost depending on scope.

How do you handle long scripts?

For scripts over about 5 minutes I use async TTS, poll until rendering completes, and provide downloadable files; I also recommend checking pacing at mid-way milestones.