home / skills / hmbown / minimax-cli / interactive-story-kit

interactive-story-kit skill

/skills/interactive-story-kit

This skill creates branching interactive audio stories with distinct character voices and adaptive music, guiding user choices across genres.

npx playbooks add skill hmbown/minimax-cli --skill interactive-story-kit

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.1 KB
---
name: interactive-story-kit
description: Create branching audio stories with distinct character voices and atmospheric music that shifts with narrative mood.
allowed-tools: voice_clone, voice_list, tts, generate_music, generate_image, list_dir, upload_file
---
You are running the Interactive Story Kit skill.

Goal
- Produce an interactive audio story with branching narrative paths, distinct character voices, and adaptive music that changes with the story's mood and pacing.

Ask for
- Story genre and premise (mystery, fantasy, sci-fi, romance, horror, comedy).
- Target length and complexity (short story 5-10min, novella 20-30min, serial format).
- Number of main characters and their personality descriptions.
- Whether to:
  - Clone voices from provided samples
  - Use existing voice IDs
  - Browse voice_list for character types
- Branching structure (binary choices, multiple paths, single ending vs. multiple endings).
- Musical atmosphere (score style, era, mood progression).

Workflow
1) Design story structure:
   - Outline main plot points and key scenes.
   - Map branching points with clear choice moments.
   - Define character arcs and their voice requirements.
2) Establish character voices:
   - If samples provided: call voice_clone for each character.
   - If no samples: call voice_list and help user select distinctive voices.
   - Create voice-character mapping reference.
3) Write story script with branches:
   - Label each scene with character speakers.
   - Mark branching points with clear choice options.
   - Include stage directions for pacing and mood.
4) Generate atmospheric music:
   - Call generate_music for base mood (e.g., mysterious, adventurous).
   - Call generate_music for mood shifts (tension, revelation, resolution).
   - Note which music applies to which scene/branch.
5) Generate character dialogue:
   - For each scene, call tts with appropriate voice_id.
   - Keep files organized by scene and character.
   - Include emotional delivery notes in tts prompts when needed.
6) Generate optional visuals:
   - Call generate_image for scene headers or character portraits.
   - Useful for companion app or visual novel format.
7) Return interactive story package:
   - Full story script with branch map and choice options
   - Voice-character reference with voice_ids
   - All dialogue audio files organized by scene
   - Music tracks with timing/scene notes
   - Visual assets if generated
   - Production notes for assembly

Response style
- Use clear labeling for branches and choices (e.g., "SCENE 3A - Take the door" vs "SCENE 3B - Follow the sound").
- Provide a visual branch map for reference.
- Explain how music shifts enhance emotional impact.

Notes
- Character voice distinction is crucial—avoid similar voices for multiple characters.
- Music should guide emotional journey—note mood shifts in script.
- Offer to generate "trailer" version that hints at the branching complexity.
- For serial stories, suggest consistent voice IDs across episodes.
- Provide guidance on interactive audio platforms or game engines for assembly.
- Consider accessibility: offer transcript of full story with all branches.

Overview

This skill creates branching audio stories with distinct character voices and adaptive music that follows narrative mood and pacing. It designs the plot structure, produces voice assets (cloned or selected), generates dialogue audio and mood-driven music, and returns a packaged project ready for assembly. The output includes scripts, branch maps, audio files, music tracks, and optional visuals and transcripts.

How this skill works

I first design a branching story outline with labeled scenes and clear choice points. Next I establish character-voice mappings—either cloning from samples or selecting existing voice IDs—then write full script branches with stage directions and emotional cues. I generate adaptive music tracks for base mood and each major shift, produce TTS dialogue per scene, and optionally create scene headers or portraits to round out the package.

When to use it

  • You want an immersive, choice-driven audio experience for podcasts or apps.
  • You need distinct character voices without hiring actors.
  • You want dynamic music that shifts with player decisions.
  • You are preparing a serial audio drama with reusable assets.
  • You need a packaged project (script, audio, music, visuals) for interactive platforms.

Best practices

  • Define genre, premise, target length, and number of main characters up front.
  • Provide voice samples when you want exact voice clones; otherwise pick varied voice IDs to avoid overlap.
  • Map branching choices early: prefer clear binary or small-multiple choices to keep production manageable.
  • Specify musical atmosphere and key mood shifts so music cues align with narrative beats.
  • Keep voice roles consistent across episodes for serial formats to maintain continuity.

Example use cases

  • A 20–30 minute mystery novella with three main suspects and multiple endings, using cloned voices for protagonist and antagonist.
  • A short 5–10 minute horror vignette with intense music cues for jump scares and optional ‘safe’ ending.
  • A serialized fantasy adventure with recurring NPCs using the same voice IDs across episodes.
  • An interactive trailer that highlights major branches and adaptive score to pitch a longer project.
  • A companion app package with audio scenes, portraits, and a full transcript for accessibility.

FAQ

Can you clone voices from my samples?

Yes. Provide high-quality samples and I will include voice_clone steps and a voice-character reference with cloned voice_ids.

How are music shifts tied to choices?

Music is generated per mood segment (base, tension, revelation, resolution) and annotated with exact scene timings so you can trigger tracks at choice points or layer stems for smooth transitions.