home / skills / hmbown / minimax-cli / audiobook-studio

audiobook-studio skill

/skills/audiobook-studio

This skill produces a multi-chapter audiobook with consistent narration, organized outputs, and configurable voice and file formats.

npx playbooks add skill hmbown/minimax-cli --skill audiobook-studio

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.1 KB
---
name: audiobook-studio
description: Produce a full multi-chapter audiobook with consistent narration and exportable files.
allowed-tools: list_dir, read_file, upload_file, tts_async_create, tts_async_query, retrieve_file, download_file, list_files, delete_file
---
You are running the Audiobook Studio skill.

Goal
- Produce a multi-chapter audiobook with consistent narration and clear output organization.

Ask for
- Folder or list of chapter files (preferred: .txt).
- Voice preference or ask to browse voices.
- Output format (mp3 or wav), and any pacing/volume notes.

Workflow
1) Discover chapters (list_dir) and confirm ordering.
2) For each chapter:
   - Upload the chapter text file (purpose: t2a_async_input).
   - Create a tts_async_create task with the chosen voice settings.
3) Poll tasks in batches with tts_async_query.
4) Download or retrieve each completed file.
5) Return a final list of audio file paths with chapter order.

Notes
- Keep voice settings consistent across all chapters unless the user requests per-character voices.
- If the user wants a sample before running the full batch, generate one chapter first and confirm.

Overview

This skill produces a full multi-chapter audiobook with consistent narration and neatly organized exportable files. It manages chapter discovery, batch text-to-audio tasks, and returns a final ordered list of audio files ready for distribution. You can request a sample chapter first or run the whole batch once voice and output preferences are confirmed.

How this skill works

The skill scans a specified folder or accepts an explicit list of chapter text files (.txt preferred) and confirms chapter ordering. For each chapter it uploads the text, creates an asynchronous TTS task with the chosen voice and settings, polls tasks in batches until completion, then retrieves and stores the resulting audio files. Voice settings are kept consistent across chapters unless you request per-character or per-chapter variations; final output is an ordered list of file paths in the requested format (mp3 or wav).

When to use it

  • You have a manuscript split into chapter text files and need a single consistent narrator for an audiobook.
  • You want batch processing of many chapters with predictable naming and ordering.
  • You need exportable audio files in mp3 or wav for distribution or further production.
  • You want to audition a sample chapter before producing the full audiobook.
  • You require consistent pacing, volume, and voice settings across all chapters.

Best practices

  • Provide chapters as numbered or clearly ordered .txt files to avoid ordering ambiguity.
  • Choose the voice and pace up front; request per-character voices only if necessary.
  • Generate a single-sample chapter first to confirm voice, pacing, and volume before full processing.
  • Use consistent file naming (e.g., 01_Chapter_Title.txt) so outputs are automatically ordered.
  • Request mp3 for smaller files and wider compatibility, or wav for highest fidelity.

Example use cases

  • Convert a 12-chapter novel (text files in a folder) into mp3 files with a single narrator and receive a final ordered list of audio paths.
  • Produce a sample chapter to review voice and pacing, then process the remaining chapters once approved.
  • Generate per-chapter outputs with consistent loudness for upload to an audiobook platform.
  • Create multiple voice variants by producing a single character-specific chapter as a sample before full batch processing.

FAQ

Can I request different voices for different characters?

Yes — but specify which chapters or sections need alternate voices; otherwise the system keeps a single consistent voice by default.

What formats are supported and which should I choose?

mp3 and wav are supported. Choose mp3 for smaller file sizes and broad compatibility; choose wav for maximum audio fidelity.