home / skills / hmbown / minimax-cli / voice-podcast-kit

voice-podcast-kit skill

/skills/voice-podcast-kit

This skill helps you produce complete podcast episodes with cloned voices, intro music, and smooth transitions from topic to audience.

npx playbooks add skill hmbown/minimax-cli --skill voice-podcast-kit

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.0 KB
---
name: voice-podcast-kit
description: Create multi-host podcast episodes with cloned voices, intro music, and transitions.
allowed-tools: voice_clone, voice_list, tts, generate_music, list_dir, upload_file
---
You are running the Voice Podcast Kit skill.

Goal
- Produce a complete podcast episode with distinct cloned voices for each host/guest, intro/outro music, and smooth transitions between segments.

Ask for
- Episode topic and title.
- List of speakers/characters (names and optional voice sample descriptions).
- Whether you should clone voices from provided audio samples, or use existing voice IDs.
- Episode length target and number of segments (intro, main discussion, listener Q&A, outro).
- Any music preferences (genre, mood, tempo).

Workflow
1) Confirm speaker lineup and collect voice samples if cloning is requested:
   - If audio files are provided, call voice_clone for each speaker.
   - If no samples, call voice_list to show available presets and let user choose.
2) Draft a segment-by-segment outline with speaker assignments and timing.
3) Generate intro/outro music:
   - Call generate_music with appropriate mood (upbeat for intro, winding down for outro).
4) Write scripts for each segment with clear speaker labels.
5) For each spoken segment:
   - Call tts with the correct voice_id for each speaker.
   - Use output_format "mp3" for smooth editing.
6) Optionally add transition sounds or music beds between segments.
7) Return a production清单:
   - Music files (intro/outro)
   - Each spoken segment as separate audio files
   - Full episode script text
   - Assembly suggestions (timeline order)

Response style
- Keep track of which voice_id maps to which speaker name clearly.
- Provide file paths organized by segment and type.
- Suggest editing workflow but leave final assembly to user.

Notes
- Voice consistency across episodes is a key value proposition—encourage users to save voice IDs for recurring hosts.
- If the user wants a sample before full production, offer to generate just the intro + first 2 minutes as a preview.

Overview

This skill produces complete multi-host podcast episodes with distinct cloned voices, intro/outro music, and clean transitions. It handles voice cloning or preset voice selection, segment scripting, TTS generation, and outputs all files ready for assembly. The workflow focuses on voice consistency so recurring hosts keep the same voice IDs across episodes.

How this skill works

You provide episode title, topic, speaker list (and optional audio samples) plus target length and segment count. I confirm the lineup, call voice_clone for provided samples or voice_list to choose presets, draft a timed segment outline, generate intro/outro music via generate_music, write segment scripts, and synthesize each line with tts into mp3 files. The result is a clear mapping of speaker -> voice_id, separate audio files per segment, the full episode script, and suggested assembly steps.

When to use it

  • Launching a multi-host podcast episode with lifelike, distinct voices
  • Producing guest interviews when you need consistent host voices across episodes
  • Creating sample previews before committing to full episode production
  • Repackaging written content into an audio episode with multiple characters
  • Rapid prototyping of episode structure and timing for planning

Best practices

  • Provide short, clean voice samples (10–30 seconds) per speaker for best cloning results
  • Save returned voice_ids for recurring hosts to maintain consistency across episodes
  • Specify clear segment timing and speaker turns to simplify scripting and TTS calls
  • Request an intro + first 2 minutes preview if you want to verify voice clones before full production
  • Use music preferences (genre, mood, tempo) to guide generate_music for coherent intro/outro

Example use cases

  • Weekly news roundtable with three hosts and one guest, 30-minute target with intro, main, Q&A, outro
  • Fictional audio drama episode with distinct character voices and transition soundscapes
  • Interview series pilot where guest voice is cloned from a provided sample
  • Preview build: generate intro + first 2 minutes to approve voices and style before full episode

FAQ

Can you reuse voices across episodes?

Yes. I return voice_ids for each speaker—store them to guarantee consistent voices in future episodes.

What file formats do you produce?

Spoken segments are produced as mp3 for smooth editing; music can be returned in common audio formats on request.