home / skills / openclaw / skills / clawrouter

clawrouter skill

safe

/skills/1bcmax/clawrouter

This skill routes each request to the cheapest capable model across 30+ providers to cut inference costs and speed.

npx playbooks add skill openclaw/skills --skill clawrouter

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.8 KB

---
name: clawrouter
description: Smart LLM router — save 67% on inference costs. Routes every request to the cheapest capable model across 41 models from OpenAI, Anthropic, Google, DeepSeek, and xAI.
homepage: https://github.com/BlockRunAI/ClawRouter
metadata: { "openclaw": { "emoji": "🦀", "requires": { "config": ["models.providers.blockrun"] } } }
---

# ClawRouter

Smart LLM router that saves 67% on inference costs by routing each request to the cheapest model that can handle it. 41 models across 5 providers, all through one wallet.

## Install

```bash
openclaw plugins install @blockrun/clawrouter
```

## Setup

```bash
# Enable smart routing (auto-picks cheapest model per request)
openclaw models set blockrun/auto

# Or pin a specific model
openclaw models set openai/gpt-4o
```

## How Routing Works

ClawRouter classifies each request into one of four tiers:

- **SIMPLE** (40% of traffic) — factual lookups, greetings, translations → Gemini Flash ($0.60/M, 99% savings)
- **MEDIUM** (30%) — summaries, explanations, data extraction → DeepSeek Chat ($0.42/M, 99% savings)
- **COMPLEX** (20%) — code generation, multi-step analysis → Claude Opus ($75/M, best quality)
- **REASONING** (10%) — proofs, formal logic, multi-step math → o3 ($8/M, 89% savings)

Rules handle ~80% of requests in <1ms. Only ambiguous queries hit the LLM classifier (~$0.00003 per classification).

## Available Models

41 models including: gpt-5.2, gpt-4o, gpt-4o-mini, o3, o1, claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5, gemini-3.1-pro, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, deepseek-chat, deepseek-reasoner, grok-3, grok-3-mini.

## Example Output

```
[ClawRouter] google/gemini-2.5-flash (SIMPLE, rules, confidence=0.92)
             Cost: $0.0025 | Baseline: $0.308 | Saved: 99.2%
```

Overview

This skill is a smart LLM router that minimizes inference costs by routing each request to the cheapest capable model across 30+ models from major providers. It automatically classifies requests into tiers and selects the optimal model, delivering similar output quality while cutting average spend by roughly 78%. The router can be pinned to a specific model or set to automatic cheap-routing per request.

How this skill works

Each incoming request is classified into one of four tiers (SIMPLE, MEDIUM, COMPLEX, REASONING) using fast rules and an occasional lightweight LLM classifier for ambiguous cases. Rules resolve about 80% of traffic in under 1 ms; ambiguous requests incur a tiny classifier cost. Based on the tier, the system picks the lowest-cost model capable of handling the task from a unified pool spanning OpenAI, Anthropic, Google, DeepSeek, and xAI, then forwards the request through a single wallet.

When to use it

Production systems that need to reduce LLM inference spend without sacrificing capability.
Applications with mixed request types (queries, summaries, code, reasoning) where cost varies by model.
Environments that require a single billing wallet for multiple provider models.
Rapid prototyping where you want automatic model selection by task complexity.

Best practices

Enable auto-routing to let the router pick the cheapest capable model for each request.
Pin a specific high-quality model for latency-sensitive or brand-sensitive endpoints.
Monitor routing logs and cost metrics to tune rule thresholds for your workload.
Label or constrain inputs where possible so rules can resolve more requests deterministically.
Test sample workloads across all tiers to verify output quality trade-offs before full rollout.

Example use cases

Customer support: route simple FAQ lookups to low-cost flash models and complex disputes to high-quality models.
Data extraction: use medium-tier models for robust parsing while saving on high-volume runs.
Code generation pipelines: send multi-step or large code tasks to complex-tier models and small snippets to cheaper ones.
Research experiments: run large numbers of simple checks at minimal cost while reserving premium models for reasoning tasks.
SaaS platforms: consolidate billing and provider diversity into one wallet while optimizing per-request cost.

FAQ

How much does classification add to cost?

The LLM classifier is used only for ambiguous cases and costs a tiny fraction per classification (~$0.00003), keeping overhead minimal.

Can I force a specific model for some endpoints?

Yes. You can pin endpoints to a chosen model while leaving the rest of traffic on auto-routing.