home / skills / jeremylongshore / claude-code-plugins-plus-skills / perplexity-performance-tuning
/plugins/saas-packs/perplexity-pack/skills/perplexity-performance-tuning
This skill helps optimize Perplexity API performance by applying caching, batching, and connection pooling to reduce latency and boost throughput.
npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill perplexity-performance-tuningReview the files below or copy the command above to add this skill to your agents.
---
name: perplexity-performance-tuning
description: |
Optimize Perplexity API performance with caching, batching, and connection pooling.
Use when experiencing slow API responses, implementing caching strategies,
or optimizing request throughput for Perplexity integrations.
Trigger with phrases like "perplexity performance", "optimize perplexity",
"perplexity latency", "perplexity caching", "perplexity slow", "perplexity batch".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---
# Perplexity Performance Tuning
## Overview
Optimize Perplexity API performance with caching, batching, and connection pooling.
## Prerequisites
- Perplexity SDK installed
- Understanding of async patterns
- Redis or in-memory cache available (optional)
- Performance monitoring in place
## Latency Benchmarks
| Operation | P50 | P95 | P99 |
|-----------|-----|-----|-----|
| Read | 50ms | 150ms | 300ms |
| Write | 100ms | 250ms | 500ms |
| List | 75ms | 200ms | 400ms |
## Caching Strategy
### Response Caching
```typescript
import { LRUCache } from 'lru-cache';
const cache = new LRUCache<string, any>({
max: 1000,
ttl: 60000, // 1 minute
updateAgeOnGet: true,
});
async function cachedPerplexityRequest<T>(
key: string,
fetcher: () => Promise<T>,
ttl?: number
): Promise<T> {
const cached = cache.get(key);
if (cached) return cached as T;
const result = await fetcher();
cache.set(key, result, { ttl });
return result;
}
```
### Redis Caching (Distributed)
```typescript
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
async function cachedWithRedis<T>(
key: string,
fetcher: () => Promise<T>,
ttlSeconds = 60
): Promise<T> {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const result = await fetcher();
await redis.setex(key, ttlSeconds, JSON.stringify(result));
return result;
}
```
## Request Batching
```typescript
import DataLoader from 'dataloader';
const perplexityLoader = new DataLoader<string, any>(
async (ids) => {
// Batch fetch from Perplexity
const results = await perplexityClient.batchGet(ids);
return ids.map(id => results.find(r => r.id === id) || null);
},
{
maxBatchSize: 100,
batchScheduleFn: callback => setTimeout(callback, 10),
}
);
// Usage - automatically batched
const [item1, item2, item3] = await Promise.all([
perplexityLoader.load('id-1'),
perplexityLoader.load('id-2'),
perplexityLoader.load('id-3'),
]);
```
## Connection Optimization
```typescript
import { Agent } from 'https';
// Keep-alive connection pooling
const agent = new Agent({
keepAlive: true,
maxSockets: 10,
maxFreeSockets: 5,
timeout: 30000,
});
const client = new PerplexityClient({
apiKey: process.env.PERPLEXITY_API_KEY!,
httpAgent: agent,
});
```
## Pagination Optimization
```typescript
async function* paginatedPerplexityList<T>(
fetcher: (cursor?: string) => Promise<{ data: T[]; nextCursor?: string }>
): AsyncGenerator<T> {
let cursor: string | undefined;
do {
const { data, nextCursor } = await fetcher(cursor);
for (const item of data) {
yield item;
}
cursor = nextCursor;
} while (cursor);
}
// Usage
for await (const item of paginatedPerplexityList(cursor =>
perplexityClient.list({ cursor, limit: 100 })
)) {
await process(item);
}
```
## Performance Monitoring
```typescript
async function measuredPerplexityCall<T>(
operation: string,
fn: () => Promise<T>
): Promise<T> {
const start = performance.now();
try {
const result = await fn();
const duration = performance.now() - start;
console.log({ operation, duration, status: 'success' });
return result;
} catch (error) {
const duration = performance.now() - start;
console.error({ operation, duration, status: 'error', error });
throw error;
}
}
```
## Instructions
### Step 1: Establish Baseline
Measure current latency for critical Perplexity operations.
### Step 2: Implement Caching
Add response caching for frequently accessed data.
### Step 3: Enable Batching
Use DataLoader or similar for automatic request batching.
### Step 4: Optimize Connections
Configure connection pooling with keep-alive.
## Output
- Reduced API latency
- Caching layer implemented
- Request batching enabled
- Connection pooling configured
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cache miss storm | TTL expired | Use stale-while-revalidate |
| Batch timeout | Too many items | Reduce batch size |
| Connection exhausted | No pooling | Configure max sockets |
| Memory pressure | Cache too large | Set max cache entries |
## Examples
### Quick Performance Wrapper
```typescript
const withPerformance = <T>(name: string, fn: () => Promise<T>) =>
measuredPerplexityCall(name, () =>
cachedPerplexityRequest(`cache:${name}`, fn)
);
```
## Resources
- [Perplexity Performance Guide](https://docs.perplexity.com/performance)
- [DataLoader Documentation](https://github.com/graphql/dataloader)
- [LRU Cache Documentation](https://github.com/isaacs/node-lru-cache)
## Next Steps
For cost optimization, see `perplexity-cost-tuning`.This skill optimizes Perplexity API performance by applying caching, request batching, and connection pooling patterns. It provides pragmatic code patterns and monitoring hooks to reduce latency and increase throughput for Perplexity integrations. Use it to establish baselines, add caches, enable batching, and tune HTTP connections.
The skill inspects common Perplexity operations and recommends where to insert response caches (in-memory or Redis), automatic batchers (DataLoader-style), and keep-alive connection pools. It includes small, reusable wrappers to measure call latency and record success/error durations. It also provides pagination helpers to stream results efficiently and reduce memory pressure.
How long should cache TTLs be?
Choose short TTLs (tens of seconds to a few minutes) for dynamic data; use stale-while-revalidate for slightly longer apparent freshness without bursts of misses.
What batch size is safe?
Start with a conservative maxBatchSize (50–100) and monitor latency and error rates, then adjust downward if batching causes timeouts.
When should I use Redis vs in-memory cache?
Use in-memory for single-process low-latency needs; use Redis for multiple instances, process restarts, or larger cache capacity.