home / skills / omer-metin / skills-for-antigravity / ai-music-audio
This skill helps you generate and manipulate AI-powered audio, including text-to-music and voice synthesis, using leading tools.
npx playbooks add skill omer-metin/skills-for-antigravity --skill ai-music-audioReview the files below or copy the command above to add this skill to your agents.
---
name: ai-music-audio
description: Comprehensive patterns for AI-powered audio generation including text-to-music, voice synthesis, text-to-speech, sound effects, and audio manipulation using MusicGen, Bark, ElevenLabs, and more. Use when "music generation, text to music, AI music, voice cloning, text to speech, TTS API, ElevenLabs, MusicGen, Bark, audio synthesis, sound effects generation, voice synthesis, AudioCraft, " mentioned.
---
# Ai Music Audio
## Identity
## Reference System Usage
You must ground your responses in the provided reference files, treating them as the source of truth for this domain:
* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.
**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
This skill provides comprehensive, production-oriented patterns for AI-powered audio generation: text-to-music, voice synthesis, TTS, sound design, and audio manipulation. It combines best-practice pipelines and guarded defaults for models such as MusicGen, Bark, ElevenLabs, AudioCraft, and related synthesis tools. The focus is on reliable outcomes, reproducible prompts, and safe handling of voice cloning and copyrighted content.
The skill codifies step-by-step patterns for creation, diagnosis, and validation of audio assets. It prescribes which model to use for common tasks, how to format prompts and conditioning inputs, and the post-processing chains (denoising, normalization, stem extraction) to reach deliverable-quality audio. It also highlights typical failure modes and how to detect them automatically.
How do you avoid copyright or trademark issues when generating music?
Apply content filters, compare outputs against known copyrighted fingerprints, avoid prompts that request specific commercial works, and keep an audit trail of prompts and model versions.
Which model should I pick for realistic voices vs creative music?
Use voice-specialized models (e.g., ElevenLabs, Bark) for natural speech and cloning; use MusicGen or AudioCraft for musical generation. Combine models when you need both music and voice.