home / skills / jeremylongshore / claude-code-plugins-plus-skills / openrouter-performance-tuning

This skill helps optimize OpenRouter performance by applying connection pooling, async processing, and caching strategies to reduce latency and increase

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill openrouter-performance-tuning

Review the files below or copy the command above to add this skill to your agents.

Files (11)
SKILL.md
1.6 KB
---
name: openrouter-performance-tuning
description: |
  Optimize OpenRouter performance and latency. Use when reducing response times or improving throughput. Trigger with phrases like 'openrouter performance', 'openrouter latency', 'speed up openrouter', 'openrouter optimization'.
allowed-tools: Read, Write, Edit, Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Openrouter Performance Tuning

## Overview

This skill covers performance optimization techniques including connection pooling, async processing, and caching strategies.

## Prerequisites

- OpenRouter integration
- Performance baseline measurements

## Instructions

Follow these steps to implement this skill:

1. **Verify Prerequisites**: Ensure all prerequisites listed above are met
2. **Review the Implementation**: Study the code examples and patterns below
3. **Adapt to Your Environment**: Modify configuration values for your setup
4. **Test the Integration**: Run the verification steps to confirm functionality
5. **Monitor in Production**: Set up appropriate logging and monitoring

## Output

Successful execution produces:
- Working OpenRouter integration
- Verified API connectivity
- Example responses demonstrating functionality

## Error Handling

See `{baseDir}/references/errors.md` for comprehensive error handling.

## Examples

See `{baseDir}/references/examples.md` for detailed examples.

## Resources

- [OpenRouter Documentation](https://openrouter.ai/docs)
- [OpenRouter Models](https://openrouter.ai/models)
- [OpenRouter API Reference](https://openrouter.ai/docs/api-reference)
- [OpenRouter Status](https://status.openrouter.ai)

Overview

This skill helps optimize OpenRouter deployments for lower latency and higher throughput. It documents practical techniques like connection pooling, asynchronous request handling, and caching to reduce response times and improve resource utilization. The guidance is implementation-focused so you can apply changes, measure impact, and iterate quickly.

How this skill works

The skill inspects request/response patterns and suggests code and configuration changes to reduce latency and increase concurrency. It guides you through adding connection pooling, converting blocking calls to async, introducing result caching, and validating improvements with baseline measurements and monitoring. It also recommends production safeguards such as timeouts, retries, and observability hooks.

When to use it

  • When average or tail latency to OpenRouter exceeds your SLA
  • When throughput is limited under expected concurrency
  • Before scaling infrastructure to identify software bottlenecks
  • When integrating OpenRouter into latency-sensitive user flows
  • During performance regression testing after upgrades

Best practices

  • Measure a clear baseline (p95, p99, throughput) before changes
  • Use connection pooling and keepalive to avoid TCP/SSL overhead
  • Prefer async/non-blocking calls in high-concurrency paths
  • Cache deterministic or repeated responses with TTL and invalidation
  • Instrument latency, error rates, and queue lengths for feedback

Example use cases

  • Reduce API latency for a chat frontend by switching to async HTTP clients and pooling
  • Increase throughput for batch inference jobs by parallelizing requests with controlled concurrency
  • Lower costs by caching repeated model outputs that are business-safe to reuse
  • Prevent service disruption by adding circuit breakers, short timeouts, and retry backoff
  • Validate upgrade impact by running A/B performance tests against the baseline

FAQ

Do I need special OpenRouter features to apply these optimizations?

No. Most optimizations use standard HTTP client features (pooling, keepalive), async patterns, and caching layers. Validate compatibility with your OpenRouter client and API version.

How do I verify that changes improved performance?

Compare pre/post baselines for p50/p95/p99 latency and throughput under representative load. Use metrics, distributed traces, and synthetic tests to verify real-world impact.