home / skills / agenta-ai / agenta / update-llm-model-list

update-llm-model-list skill

/.claude/skills/update-llm-model-list

This skill audits and updates the supported LLM model list in assets.py to align with litellm's registry, ensuring accuracy and zero duplicates.

npx playbooks add skill agenta-ai/agenta --skill update-llm-model-list

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.2 KB
---
name: update-llm-model-list
description: Audit and update the supported LLM model list in assets.py against litellm's registry (models.litellm.ai). Use when adding new models, pruning outdated ones, or verifying the list is correct.
---

# Update LLM Model List

## Overview

The canonical model list lives in `sdk/agenta/sdk/assets.py` → `supported_llm_models`.
It drives the model dropdown in the playground, cost metadata, and the `model_to_provider_mapping`.

The authoritative external source is **`litellm.model_cost`** (2 600+ entries), which mirrors
<https://models.litellm.ai/>.

A pytest guard lives at:
`sdk/oss/tests/pytest/unit/test_supported_llm_models.py`

---

## Key rules

1. **Every model must exist in `litellm.model_cost`** (direct key, or with provider prefix stripped).
   - `anthropic/claude-*` → litellm stores as `claude-*` (prefix is intentional for routing, stripped for cost lookup)
   - `cohere/command-*` → litellm stores as `command-*`
   - All other providers keep their full prefix (e.g. `gemini/`, `groq/`, `together_ai/`)
2. **Provider key** (`"anthropic"`, `"gemini"`, …) must match the Secrets API enum in
   `api/oss/src/core/secrets/enums.py` (`StandardProviderKind`).
3. **No duplicates** within a provider list.

---

## Step 1 — Check which current models are outdated / wrong

Run this with `uvx` (no local install needed):

```bash
cat > /tmp/check_agenta_models.py << 'SCRIPT'
# /// script
# requires-python = ">=3.11"
# dependencies = ["litellm"]
# ///
import litellm, sys

# paste supported_llm_models here or import it
from agenta.sdk.assets import supported_llm_models

mc = set(litellm.model_cost.keys())

def exists(m):
    if m in mc: return True
    if "/" in m and m.split("/", 1)[1] in mc: return True
    return False

fails = []
for provider, models in supported_llm_models.items():
    for model in models:
        if not exists(model):
            fails.append((provider, model))

total = sum(len(v) for v in supported_llm_models.values())
print(f"Total models checked: {total}")
if fails:
    for p, m in fails:
        print(f"  MISSING [{p}] {m}")
    sys.exit(1)
else:
    print("All models valid ✓")
SCRIPT
uvx --with litellm python /tmp/check_agenta_models.py 2>/dev/null
```

Alternatively, run the pytest unit test directly (requires agenta installed):

```bash
pytest sdk/oss/tests/pytest/unit/test_supported_llm_models.py -v
```

---

## Step 2 — Find models missing from Agenta (big-3 audit)

This script finds models in litellm that Agenta doesn't list yet, filtered to remove
noise (audio, video, embeddings, codex, snapshots):

```bash
cat > /tmp/find_missing.py << 'SCRIPT'
# /// script
# requires-python = ">=3.11"
# dependencies = ["litellm"]
# ///
import litellm, re

AGENTA_ANTHROPIC = set()   # fill from assets.py (bare names, no prefix)
AGENTA_OPENAI    = set()   # fill from assets.py
AGENTA_GEMINI    = set()   # fill from assets.py (with gemini/ prefix)

mc = set(litellm.model_cost.keys())

NOISE = [
    "audio","tts","speech","whisper","transcri","realtime","diarize",
    "dall-e","image","video","veo","embed","moderat","search",
    "babbage","davinci","ada","instruct","codex","computer-use",
    "robotics","learnlm","gemma","live","v1:0",
]
KEEP = {"gpt-4o","gpt-4o-mini"}
DATED = re.compile(r"-\d{4}-\d{2}-\d{2}$")
EXP   = re.compile(r"exp-\d{4}|\d{2}-\d{2}$")

def noise(m):
    if m in KEEP: return False
    return any(kw in m.lower() for kw in NOISE)

def dated(m):
    return bool(DATED.search(m)) or bool(EXP.search(m))

def report(label, candidates, known, prefix=""):
    print(f"\n=== {label} ===")
    for m in sorted(candidates):
        bare = m[len(prefix):] if prefix else m
        if bare in known or m in known: continue
        tag = "[dated/exp]" if dated(m) else "[alias]" if m.endswith("-latest") else "*** MISSING ***"
        print(f"  {m}  {tag}")

# Anthropic
report("ANTHROPIC", [m for m in mc if m.startswith("claude-") and not noise(m)],
       AGENTA_ANTHROPIC)

# OpenAI (no slash, starts with gpt- / o1 / o3 / o4)
OAI = [m for m in mc if any(m.startswith(p) for p in ("gpt-","o1","o3","o4","chatgpt"))
       and "/" not in m and not noise(m)]
report("OPENAI", OAI, AGENTA_OPENAI)

# Gemini
report("GEMINI", [m for m in mc if m.startswith("gemini/") and not noise(m)],
       AGENTA_GEMINI, prefix="gemini/")
SCRIPT
uvx --with litellm python /tmp/find_missing.py 2>/dev/null
```

**Fill in the `AGENTA_*` sets from the current `assets.py`** before running.

---

## Step 3 — Edit `assets.py`

File: `sdk/agenta/sdk/assets.py`

- Add models inside the correct provider list, newest first.
- For **Gemini 1.5** models (still widely used): add under `"gemini"`.
- For **OpenAI o-series pro tiers** (`o1-pro`, `o3-pro`): add after their base model.
- For **Groq**: always cross-check `litellm.groq_models` — Groq rotates its model catalogue frequently.
- For **DeepInfra / Together AI**: check `litellm.deepinfra_models` / `litellm.together_ai_models` for current names.

### Provider prefix conventions

| Provider key | Agenta prefix | litellm cost key prefix |
|---|---|---|
| `anthropic` | `anthropic/` | `claude-` (no prefix) |
| `cohere` | `cohere/` | `command-` (no prefix) |
| `gemini` | `gemini/` | `gemini/` |
| `groq` | `groq/` | `groq/` |
| `mistral` | `mistral/` | `mistral/` |
| `openai` | _(none)_ | _(none)_ |
| `openrouter` | `openrouter/` | `openrouter/` |
| `perplexityai` | `perplexity/` | `perplexity/` |
| `together_ai` | `together_ai/` | `together_ai/` |
| `deepinfra` | `deepinfra/` | `deepinfra/` |

---

## Step 4 — Run ruff then the test

```bash
# Format + lint
uvx --from ruff==0.14.0 ruff format sdk/agenta/sdk/assets.py
uvx --from ruff==0.14.0 ruff check --fix sdk/agenta/sdk/assets.py

# Validate all models against litellm (no agenta install needed)
uvx --with litellm python /tmp/check_agenta_models.py 2>/dev/null
```

All checks must pass before committing.

---

## Related files

| File | Purpose |
|---|---|
| `sdk/agenta/sdk/assets.py` | Canonical model list + cost metadata builder |
| `sdk/oss/tests/pytest/unit/test_supported_llm_models.py` | Pytest guard (parametrized per model) |
| `api/oss/src/core/secrets/enums.py` | Provider keys — must stay in sync |
| `api/oss/src/resources/evaluators/evaluators.py` | Separate (shorter) model list for evaluator dropdown |

Overview

This skill audits and updates the supported LLM model list in assets.py against litellm's registry (models.litellm.ai). It ensures the playground dropdown, cost metadata, and model-to-provider mapping stay correct when adding new models or pruning outdated entries. Use it to validate model names, provider prefixes, and to run the automated guard test before committing.

How this skill works

The skill compares the canonical supported_llm_models map in sdk/agenta/sdk/assets.py to litellm.model_cost keys. It flags missing or mismatched entries, respects provider prefix conventions (e.g., anthropic/ -> claude- in litellm), and surfaces duplicates or invalid provider keys. It integrates with small Python helper scripts and a pytest guard to verify the list programmatically.

When to use it

  • Adding new LLMs to the playground or cost metadata
  • Removing deprecated or renamed models from the list
  • Verifying model/provider prefix correctness before a release
  • Auditing model coverage against litellm after a major provider update
  • Running CI checks to prevent broken model dropdowns or cost lookups

Best practices

  • Always run the provided check script (uvx + litellm) or the pytest guard after edits
  • Fill provider-specific sets before running the missing-models audit to avoid noise
  • Add new models newest-first within each provider list for clarity
  • Follow provider prefix conventions (anthropic/ -> claude- lookup, cohere/ -> command- lookup)
  • Cross-check Groq, DeepInfra, and Together AI lists using litellm’s provider-specific helpers

Example use cases

  • A maintainer adds several OpenAI o-series and needs to confirm cost keys match litellm
  • A release requires pruning old dated snapshots and verifying no dropdown regressions
  • Detecting a mismatched provider enum value that would break Secrets API lookups
  • Auditing Gemma/Gemini model additions and placing Gemini 1.5 models under the gemini key
  • Ensuring Groq model names match the rotating groq catalogue before deployment

FAQ

What if a model name exists with and without a provider prefix in litellm?

The audit accepts either the direct litellm key or the provider-prefixed form; anthropic and cohere are special-cased where the prefix is stripped for cost lookup.

How do I handle frequently rotating catalogs like Groq?

Cross-check litellm.groq_models and prefer their current canonical names; re-run the check script after any Groq updates.

Which tests must pass before committing?

Run ruff format/check, then run the model validation script or the pytest guard; all checks must pass to avoid runtime dropdown or cost lookup failures.