home / skills / jeremylongshore / claude-code-plugins-plus-skills / openrouter-caching-strategy

This skill helps implement OpenRouter response caching to reduce latency and costs by using LRU and semantic caching strategies.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill openrouter-caching-strategy

Review the files below or copy the command above to add this skill to your agents.

Files (9)
SKILL.md
1.7 KB
---
name: openrouter-caching-strategy
description: |
  Implement response caching for OpenRouter efficiency. Use when optimizing costs or reducing latency for repeated queries. Trigger with phrases like 'openrouter cache', 'cache llm responses', 'openrouter redis', 'semantic caching'.
allowed-tools: Read, Write, Edit, Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Openrouter Caching Strategy

## Overview

This skill covers caching strategies from simple LRU caches to semantic similarity caching for intelligent response reuse.

## Prerequisites

- OpenRouter integration
- Caching infrastructure (Redis recommended for production)

## Instructions

Follow these steps to implement this skill:

1. **Verify Prerequisites**: Ensure all prerequisites listed above are met
2. **Review the Implementation**: Study the code examples and patterns below
3. **Adapt to Your Environment**: Modify configuration values for your setup
4. **Test the Integration**: Run the verification steps to confirm functionality
5. **Monitor in Production**: Set up appropriate logging and monitoring

## Output

Successful execution produces:
- Working OpenRouter integration
- Verified API connectivity
- Example responses demonstrating functionality

## Error Handling

See `{baseDir}/references/errors.md` for comprehensive error handling.

## Examples

See `{baseDir}/references/examples.md` for detailed examples.

## Resources

- [OpenRouter Documentation](https://openrouter.ai/docs)
- [OpenRouter Models](https://openrouter.ai/models)
- [OpenRouter API Reference](https://openrouter.ai/docs/api-reference)
- [OpenRouter Status](https://status.openrouter.ai)

Overview

This skill implements response caching strategies to reduce cost and latency when using OpenRouter. It covers simple in-memory LRU caches, Redis-backed caches for production, and semantic similarity caching to reuse relevant responses. Use it to avoid repeated calls for similar prompts and to improve throughput for high-volume workloads.

How this skill works

The skill inspects outgoing prompts and incoming model responses, deciding whether to store or retrieve a cached entry based on exact-match or semantic similarity. For production, it integrates with Redis to persist cached responses, TTLs, and eviction policies; for local testing it supports an LRU in-memory store. A similarity layer computes embeddings and uses vector or nearest-neighbor lookups to return semantically matching responses when confidence thresholds are met.

When to use it

  • When repeated or predictable prompts cause repeated OpenRouter calls and higher costs
  • To reduce response latency for frequently asked queries or UI actions
  • When you need a production-grade cache with persistence and eviction (use Redis)
  • During load spikes to prevent hitting rate limits or spiking compute costs
  • When implementing semantic reuse for paraphrased user inputs

Best practices

  • Start with an in-memory LRU for development, then move to Redis for production
  • Define sensible TTLs per prompt type to avoid serving stale information
  • Use embeddings and a confidence threshold to enable semantic caching safely
  • Log cache hits and misses and monitor hit rate, latency, and storage use
  • Store metadata (model, temperature, prompt hash) so cached responses remain valid

Example use cases

  • Caching FAQ responses for a customer support chatbot to cut cost and latency
  • Storing model completions for deterministic prompts in an automated pipeline
  • Semantic caching for paraphrased user queries in an assistant or knowledge app
  • Protecting an app from rate limits during marketing spikes by serving cached answers
  • A/B testing model parameters while reusing baseline completions to save budget

FAQ

Do I need Redis for this skill?

Redis is recommended for production for persistence and shared cache across instances; an in-memory LRU is fine for local development and testing.

How does semantic caching avoid returning wrong answers?

Semantic caching uses embedding similarity plus a configurable confidence threshold and stores model/temperature metadata so you only reuse responses when similarity and metadata match your safety rules.