home / skills / hmbown / minimax-cli / audiobook-workshop

audiobook-workshop skill

/skills/audiobook-workshop

This skill converts long text or files into a high-quality audiobook using async TTS, voice options, and flexible output formats.

npx playbooks add skill hmbown/minimax-cli --skill audiobook-workshop

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.4 KB
---
name: audiobook-workshop
description: Turn long text or a file into an audiobook using async TTS, voice selection, and file tools.
allowed-tools: upload_file, tts_async_create, tts_async_query, voice_list, retrieve_file, download_file, list_files, delete_file
---
You are running the Audiobook Workshop skill.

Goal
- Convert long text or a text file into a high-quality audiobook using MiniMax async TTS.

Ask for
- Source: raw text or a file path (preferred for long chapters).
- Voice preference (or ask to browse voices).
- Output format (mp3 or wav), plus optional pace/volume guidance.

Workflow
1) If a file path is provided, upload it:
   - Call upload_file with purpose: "t2a_async_input".
   - Capture the returned file_id.
2) If the user wants voice options, call voice_list and summarize choices.
3) Create the async TTS task:
   - Call tts_async_create with model "speech-02-hd".
   - Use text OR text_file_id (from the upload), not both.
   - Pass optional voice_setting_json and audio_setting_json if requested.
4) Poll until completion:
   - Call tts_async_query with task_id.
   - If the response includes file_id or file_url, download_file or retrieve_file to save.
5) Summarize the output paths and next steps (playback, sharing, or cleanup).

Response style
- Keep updates short and outcome-focused.
- Prefer one-line progress updates and a final list of file paths.

Overview

This skill converts long text or uploaded text files into polished audiobooks using asynchronous TTS, selectable voices, and export options. It focuses on handling large chapters reliably and delivering final audio files in MP3 or WAV formats. The workflow emphasizes minimal user steps and clear progress updates.

How this skill works

Provide raw text or upload a text file; the skill uploads long files and creates an async TTS job with the speech-02-hd model. It can list available voices for selection, apply optional voice or audio settings (pace, volume), then poll the async task until a file_id or file_url is ready. Final audio files are downloaded and returned with paths for playback or sharing.

When to use it

  • You have long-form content (chapters, reports) that exceeds synchronous TTS limits.
  • You need selectable voices and audio tuning (pace, volume) for professional output.
  • You prefer background/asynchronous processing for large jobs.
  • You want MP3 or WAV outputs suitable for distribution.
  • You need clear, resumable workflows for multi-chapter projects.

Best practices

  • Provide a file path for long chapters to avoid input size limits and preserve formatting.
  • Request voice options first if unsure; pick a voice and specify any pace/volume before creating the task.
  • Use MP3 for smaller file sizes and wide compatibility, WAV for highest fidelity or post-processing.
  • Include simple SSML or brief pacing notes rather than long inline edits for consistent narration pacing.
  • Keep each chapter as a single file or logical unit so async tasks remain manageable.

Example use cases

  • Turn a full novel or multi-chapter manuscript into individual chapter MP3 files for distribution.
  • Convert lecture transcripts into high-fidelity WAV files for offline listening or transcription checks.
  • Generate multiple voice variants for a sample chapter to choose the best narrator.
  • Batch-process blog article series into an audiobook bundle for subscribers.
  • Produce accessible audio versions of lengthy manuals or reports for workplace use.

FAQ

How do I supply a large manuscript?

Upload the text file and provide the returned file_id; the skill will use that file_id when creating the async TTS job.

What output formats and settings can I choose?

You can choose MP3 or WAV and optionally pass voice and audio settings (pace, volume). Specify preferences before starting the async task so they are applied to the rendered audio.