home / skills / omer-metin / skills-for-antigravity / ai-music-audio

ai-music-audio skill

/skills/ai-music-audio

This skill helps you generate and manipulate AI-powered audio, including text-to-music and voice synthesis, using leading tools.

npx playbooks add skill omer-metin/skills-for-antigravity --skill ai-music-audio

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.3 KB
---
name: ai-music-audio
description: Comprehensive patterns for AI-powered audio generation including text-to-music, voice synthesis, text-to-speech, sound effects, and audio manipulation using MusicGen, Bark, ElevenLabs, and more. Use when "music generation, text to music, AI music, voice cloning, text to speech, TTS API, ElevenLabs, MusicGen, Bark, audio synthesis, sound effects generation, voice synthesis, AudioCraft, " mentioned. 
---

# Ai Music Audio

## Identity



## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill provides comprehensive, production-oriented patterns for AI-powered audio generation: text-to-music, voice synthesis, TTS, sound design, and audio manipulation. It combines best-practice pipelines and guarded defaults for models such as MusicGen, Bark, ElevenLabs, AudioCraft, and related synthesis tools. The focus is on reliable outcomes, reproducible prompts, and safe handling of voice cloning and copyrighted content.

How this skill works

The skill codifies step-by-step patterns for creation, diagnosis, and validation of audio assets. It prescribes which model to use for common tasks, how to format prompts and conditioning inputs, and the post-processing chains (denoising, normalization, stem extraction) to reach deliverable-quality audio. It also highlights typical failure modes and how to detect them automatically.

When to use it

  • You need high-quality text-to-music tracks or musical motifs from prompts.
  • You want natural or cloned voice output for narration, demos, or prototypes.
  • You must generate sound effects or Foley with controllable timbre and timing.
  • You require safe-handling rules for copyrighted material and consented voice cloning.
  • You need validation checks to enforce duration, loudness, and format constraints.

Best practices

  • Start with model-specific prompt templates and small iterations to tune style and rhythm.
  • Always run automated validations for format, loudness, and prohibited-content checks before release.
  • Use stem export and multitrack post-processing for mixing rather than broad denoising on final stereo files.
  • For voice cloning, obtain explicit consent and embed provenance metadata in outputs.
  • Prefer reproducible seeds and manifest files to track model version, prompt, and hyperparameters.

Example use cases

  • Generate a 30–60s ambient music bed from a textual mood and instrumentation spec for podcast intros.
  • Clone a consenting narrator’s voice to create localized TTS for e-learning modules.
  • Produce synchronized sound effects for game events using parameterized prompts for pitch and length.
  • Convert long-form scripts into chaptered TTS with chapter markers, normalized loudness, and multiple voice styles.
  • Create stems (bass, drums, pads, melody) with MusicGen-style conditioning for later DAW mixing.

FAQ

How do you avoid copyright or trademark issues when generating music?

Apply content filters, compare outputs against known copyrighted fingerprints, avoid prompts that request specific commercial works, and keep an audit trail of prompts and model versions.

Which model should I pick for realistic voices vs creative music?

Use voice-specialized models (e.g., ElevenLabs, Bark) for natural speech and cloning; use MusicGen or AudioCraft for musical generation. Combine models when you need both music and voice.