llm skill

safe

/skills/0xterrybit/llm

This skill provides a unified multi-provider LLM interface to compare models, switch providers, and estimate tokens with streaming responses.

npx playbooks add skill openclaw/skills --skill llm

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

644 B

---
name: llm
description: Multi-provider LLM integration. Unified interface for OpenAI, Anthropic, Google, and local models.
metadata: {"clawdbot":{"emoji":"🔮","always":true,"requires":{"bins":["curl","jq"]}}}
---

# LLM 🔮

Multi-provider Large Language Model integration.

## Supported Providers

- OpenAI (GPT-4, GPT-4o)
- Anthropic (Claude)
- Google (Gemini)
- Local models (Ollama, LM Studio)

## Features

- Unified chat interface
- Model comparison
- Token counting
- Cost estimation
- Streaming responses

## Usage Examples

```
"Compare GPT-4 vs Claude on this task"
"Use local Llama model"
"Estimate tokens for this prompt"
```

Overview

This skill provides a unified interface to multiple large language model providers including OpenAI, Anthropic, Google, and local models. It simplifies switching between providers, comparing outputs, and managing model-specific details like token usage and costs. The integration focuses on consistent chat behavior and streaming responses for low-latency applications.

How this skill works

The skill routes prompts to the selected provider using provider-specific adapters while exposing a consistent chat API. It can query multiple models in parallel to produce model comparisons, measure token usage, and estimate cost based on provider pricing. Streaming support forwards incremental model outputs to your client, and local adapters let you run inference on on-prem models like Ollama or LM Studio.

When to use it

You need to compare outputs from different LLM providers for accuracy or style.
You want a single API to switch between cloud and local models without rewriting logic.
You must estimate token counts or costs before sending large prompts.
You need streaming responses for chat UIs or real-time apps.
You want to run experiments across OpenAI, Anthropic, Google, and local models.

Best practices

Normalize prompts and system messages to reduce variability when comparing models.
Run small calibration tests to measure tokens and latency per model before production.
Prefer streaming for user-facing chat to improve perceived responsiveness.
Use local models for sensitive data or when minimizing cloud costs and latency.
Cache model comparisons and expensive prompts to avoid repeated costs.

Example use cases

Compare GPT-4 and Claude answers to decide which model fits a customer support workflow.
Estimate token usage and cost for a batch of prompts before running a large job.
Switch to a local Llama-based model for on-prem inference to meet privacy requirements.
Stream completions into a chat interface for a live assistant with low latency.
Run automated A/B tests across multiple providers to evaluate hallucination rates.

FAQ

Can I run local models and cloud providers together?

Yes. The skill supports hybrid workflows so you can route some requests to local models and others to cloud providers using the same API.

How does cost estimation work?

Cost estimation uses model-specific token counting plus configurable pricing rates to provide approximate runtime costs before you send requests.