home / skills / plurigrid / asi / elevenlabs-acset

elevenlabs-acset skill

safe

This skill exposes ElevenLabs API as typed ACSet objects for voice synthesis, enabling easy voice selection, TTS requests, and history persistence.

npx playbooks add skill plurigrid/asi --skill elevenlabs-acset

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

4.5 KB

---
name: elevenlabs-acset
description: "Skill elevenlabs-acset"
---

# ElevenLabs ACSet: Voice Synthesis as Typed Data Structure

**Status**: ✅ Production Ready  
**Trit**: -1 (MINUS - consumption/input to creative pipeline)  
**Principle**: OpenAPI → ACSet deterministic conversion for voice synthesis  
**Frame**: Voice as typed data structure in categorical database

---

## Overview

This skill bridges the ElevenLabs API to the ACSet ecosystem, enabling:
1. **OpenAPI → ACSet** schema generation from ElevenLabs API spec
2. **Voice selection** as typed morphisms
3. **Phone agent configuration** for +14156960069 special agent
4. **DuckDB persistence** of voice generation history
5. **Gay.jl color integration** for voice identity

## API Endpoints → ACSet Objects

| ElevenLabs Endpoint | ACSet Object | Trit | Color |
|---------------------|--------------|------|-------|
| `/v1/text-to-speech` | `TTSRequest` | -1 | #0243d2 |
| `/v1/voices` | `Voice` | 0 | #26D826 |
| `/v1/models` | `Model` | 0 | #FFD700 |
| `/v1/history` | `HistoryItem` | +1 | #FF6B6B |
| `/v1/sound-generation` | `SoundEffect` | -1 | #9B59B6 |
| `/v1/audio-isolation` | `AudioIsolation` | 0 | #3498DB |

## Schema

```julia
@present SchElevenLabsACSet(FreeSchema) begin
    # Objects
    Voice::Ob
    Model::Ob
    TTSRequest::Ob
    HistoryItem::Ob
    
    # Morphisms
    voice_for_request::Hom(TTSRequest, Voice)
    model_for_request::Hom(TTSRequest, Model)
    history_of_request::Hom(HistoryItem, TTSRequest)
    
    # Attributes
    VoiceId::AttrType
    VoiceName::AttrType
    ModelId::AttrType
    Text::AttrType
    AudioBytes::AttrType
    Timestamp::AttrType
    CharacterCount::AttrType
    
    voice_id::Attr(Voice, VoiceId)
    voice_name::Attr(Voice, VoiceName)
    model_id::Attr(Model, ModelId)
    request_text::Attr(TTSRequest, Text)
    history_audio::Attr(HistoryItem, AudioBytes)
    history_timestamp::Attr(HistoryItem, Timestamp)
    history_chars::Attr(HistoryItem, CharacterCount)
end
```

## Phone Agent Configuration

The special agent at **+14156960069** uses this ACSet for:

```yaml
# ~/.topos/skills/elevenlabs-acset/phone_agent.yaml
agent_id: monaduck69-voice
phone: "+14156960069"
voice_settings:
  voice_id: "cgSgspJ2msm6clMCkdW9"  # Default ElevenLabs voice
  model_id: "eleven_multilingual_v2"
  stability: 0.5
  similarity_boost: 0.75
  style: 0.4
  speed: 1.0

media_sources:
  soundcloud: "rickroderick"
  bandcamp: "monaduck69"
  poe: "bmorphism"

capabilities:
  - text_to_speech
  - voice_cloning
  - audio_isolation
  - sound_effects
```

## Usage

### Julia Integration

```julia
include("ElevenLabsACSet.jl")
using .ElevenLabsACSetModule

# Initialize with API key from environment
acset = create_elevenlabs_acset()

# Add voices
voice_id = add_voice!(acset, "cgSgspJ2msm6clMCkdW9", "Default Voice")

# Add model
model_id = add_model!(acset, "eleven_multilingual_v2")

# Generate TTS request
request_id = add_tts_request!(acset, voice_id, model_id, "Hello from ACSet!")

# Persist to DuckDB
persist_to_duckdb!(acset, "~/.topos/elevenlabs.duckdb")
```

### MCP Server Integration

The `bmorphism__elevenlabs-mcp-enhanced` server provides:

```bash
# Start the unified server
uvx elevenlabs-mcp

# Tools available:
# - text_to_speech (with V3 audio tags)
# - list_voices
# - get_voice
# - voice_design
# - sound_effects
# - audio_isolation
```

## GF(3) Integration

```
elevenlabs-acset (-1) ⊗ crossmodal-gf3 (0) ⊗ gesture-hypergestures (+1) = 0 ✓
```

| Trit | Role | Description |
|------|------|-------------|
| MINUS (-1) | Input | Voice synthesis consumes text |
| ERGODIC (0) | Transform | crossmodal-gf3 maps to modalities |
| PLUS (+1) | Output | gesture-hypergestures performs |

## Related Skills

- **rick-roderick**: Philosophy lectures on SoundCloud
- **catsharp-sonification**: Color → Sound mapping
- **say-narration**: macOS TTS with mathematician personas
- **crossmodal-gf3**: GF(3) → {Tactile, Auditory, Haptic}

## Environment Variables

```bash
# Required
ELEVENLABS_API_KEY=xi-xxxxxxxx  # or sk-xxxxxxxx

# Optional
ELEVENLABS_DEBUG=false
ELEVENLABS_OUTPUT_DIR=~/Desktop
```

## Commands

```bash
# Test connection
just elevenlabs-test

# List voices
just elevenlabs-voices

# Generate speech
just elevenlabs-tts "Hello world"

# Sync history to DuckDB
just elevenlabs-sync
```

---

**Skill Name**: elevenlabs-acset  
**Type**: Voice Synthesis / OpenAPI ACSet / Media Pipeline  
**Trit**: -1 (MINUS)  
**Source**: [ElevenLabs API](https://elevenlabs.io/docs/api-reference/introduction)  
**MCP**: `bmorphism__elevenlabs-mcp-enhanced`

Overview

This skill bridges the ElevenLabs text and audio APIs into an ACSet (categorical) data model for deterministic voice synthesis workflows. It exposes ElevenLabs endpoints as typed objects and morphisms, persists generation history to DuckDB, and supplies phone agent defaults and media integrations for production use.

How this skill works

The skill converts the ElevenLabs OpenAPI spec into an ACSet schema mapping endpoints (TTS, voices, models, history, sound generation) to objects and attributes. Voice selection, model binding, and history entries are represented as typed morphisms; generated audio and metadata are stored in DuckDB for auditing and replay. A preconfigured phone agent profile and command-line helpers provide ready-to-run integration.

When to use it

You need deterministic, schema-driven voice synthesis integrated into a categorical data pipeline.
You want to audit or replay TTS output with structured metadata persisted in DuckDB.
You need a production-ready phone agent or telephony voice endpoint backed by ElevenLabs.
You want to map voices and models into typed identities for downstream automation.
You require a unified interface for ElevenLabs features: TTS, voice cloning, isolation, and effects.

Best practices

Store ELEVENLABS_API_KEY in environment variables and avoid embedding keys in code.
Persist TTS requests and audio blobs to DuckDB to enable reproducible workflows and analytics.
Use voice and model IDs from the ACSet schema to ensure consistent selection across runs.
Enable debug only for short sessions; keep production stability and similarity_boost tuned per agent.
Version the ACSet schema when adding attributes or morphisms to preserve compatibility.

Example use cases

Create a phone agent that answers calls with a deterministic ElevenLabs voice and logs audio to DuckDB.
Batch-generate narrated audio for articles using model and voice morphisms, then replay history items.
Integrate sound effects and audio isolation into a media pipeline while tracking provenance in the ACSet.
Design voice identities with color-coding and attributes for multi-voice applications or character casting.
Run an MCP server to expose TTS, voice design, and isolation tools to other services.

FAQ

How do I initialize the ACSet with my API key?

Set ELEVENLABS_API_KEY in your environment, then call the provided create_elevenlabs_acset initializer which reads the key and constructs the schema mappings.

Where is generated audio stored?

Audio bytes and metadata are persisted as HistoryItem entries in DuckDB by default; you can configure ELEVENLABS_OUTPUT_DIR for alternate export locations.