home / skills / jeremylongshore / claude-code-plugins-plus-skills / perplexity-performance-tuning

perplexity-performance-tuning skill

safe

/plugins/saas-packs/perplexity-pack/skills/perplexity-performance-tuning

This skill helps optimize Perplexity API performance by applying caching, batching, and connection pooling to reduce latency and boost throughput.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill perplexity-performance-tuning

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.1 KB

---
name: perplexity-performance-tuning
description: |
  Optimize Perplexity API performance with caching, batching, and connection pooling.
  Use when experiencing slow API responses, implementing caching strategies,
  or optimizing request throughput for Perplexity integrations.
  Trigger with phrases like "perplexity performance", "optimize perplexity",
  "perplexity latency", "perplexity caching", "perplexity slow", "perplexity batch".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Perplexity Performance Tuning

## Overview
Optimize Perplexity API performance with caching, batching, and connection pooling.

## Prerequisites
- Perplexity SDK installed
- Understanding of async patterns
- Redis or in-memory cache available (optional)
- Performance monitoring in place

## Latency Benchmarks

| Operation | P50 | P95 | P99 |
|-----------|-----|-----|-----|
| Read | 50ms | 150ms | 300ms |
| Write | 100ms | 250ms | 500ms |
| List | 75ms | 200ms | 400ms |

## Caching Strategy

### Response Caching
```typescript
import { LRUCache } from 'lru-cache';

const cache = new LRUCache<string, any>({
  max: 1000,
  ttl: 60000, // 1 minute
  updateAgeOnGet: true,
});

async function cachedPerplexityRequest<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttl?: number
): Promise<T> {
  const cached = cache.get(key);
  if (cached) return cached as T;

  const result = await fetcher();
  cache.set(key, result, { ttl });
  return result;
}
```

### Redis Caching (Distributed)
```typescript
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

async function cachedWithRedis<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttlSeconds = 60
): Promise<T> {
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const result = await fetcher();
  await redis.setex(key, ttlSeconds, JSON.stringify(result));
  return result;
}
```

## Request Batching

```typescript
import DataLoader from 'dataloader';

const perplexityLoader = new DataLoader<string, any>(
  async (ids) => {
    // Batch fetch from Perplexity
    const results = await perplexityClient.batchGet(ids);
    return ids.map(id => results.find(r => r.id === id) || null);
  },
  {
    maxBatchSize: 100,
    batchScheduleFn: callback => setTimeout(callback, 10),
  }
);

// Usage - automatically batched
const [item1, item2, item3] = await Promise.all([
  perplexityLoader.load('id-1'),
  perplexityLoader.load('id-2'),
  perplexityLoader.load('id-3'),
]);
```

## Connection Optimization

```typescript
import { Agent } from 'https';

// Keep-alive connection pooling
const agent = new Agent({
  keepAlive: true,
  maxSockets: 10,
  maxFreeSockets: 5,
  timeout: 30000,
});

const client = new PerplexityClient({
  apiKey: process.env.PERPLEXITY_API_KEY!,
  httpAgent: agent,
});
```

## Pagination Optimization

```typescript
async function* paginatedPerplexityList<T>(
  fetcher: (cursor?: string) => Promise<{ data: T[]; nextCursor?: string }>
): AsyncGenerator<T> {
  let cursor: string | undefined;

  do {
    const { data, nextCursor } = await fetcher(cursor);
    for (const item of data) {
      yield item;
    }
    cursor = nextCursor;
  } while (cursor);
}

// Usage
for await (const item of paginatedPerplexityList(cursor =>
  perplexityClient.list({ cursor, limit: 100 })
)) {
  await process(item);
}
```

## Performance Monitoring

```typescript
async function measuredPerplexityCall<T>(
  operation: string,
  fn: () => Promise<T>
): Promise<T> {
  const start = performance.now();
  try {
    const result = await fn();
    const duration = performance.now() - start;
    console.log({ operation, duration, status: 'success' });
    return result;
  } catch (error) {
    const duration = performance.now() - start;
    console.error({ operation, duration, status: 'error', error });
    throw error;
  }
}
```

## Instructions

### Step 1: Establish Baseline
Measure current latency for critical Perplexity operations.

### Step 2: Implement Caching
Add response caching for frequently accessed data.

### Step 3: Enable Batching
Use DataLoader or similar for automatic request batching.

### Step 4: Optimize Connections
Configure connection pooling with keep-alive.

## Output
- Reduced API latency
- Caching layer implemented
- Request batching enabled
- Connection pooling configured

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cache miss storm | TTL expired | Use stale-while-revalidate |
| Batch timeout | Too many items | Reduce batch size |
| Connection exhausted | No pooling | Configure max sockets |
| Memory pressure | Cache too large | Set max cache entries |

## Examples

### Quick Performance Wrapper
```typescript
const withPerformance = <T>(name: string, fn: () => Promise<T>) =>
  measuredPerplexityCall(name, () =>
    cachedPerplexityRequest(`cache:${name}`, fn)
  );
```

## Resources
- [Perplexity Performance Guide](https://docs.perplexity.com/performance)
- [DataLoader Documentation](https://github.com/graphql/dataloader)
- [LRU Cache Documentation](https://github.com/isaacs/node-lru-cache)

## Next Steps
For cost optimization, see `perplexity-cost-tuning`.

Overview

This skill optimizes Perplexity API performance by applying caching, request batching, and connection pooling patterns. It provides pragmatic code patterns and monitoring hooks to reduce latency and increase throughput for Perplexity integrations. Use it to establish baselines, add caches, enable batching, and tune HTTP connections.

How this skill works

The skill inspects common Perplexity operations and recommends where to insert response caches (in-memory or Redis), automatic batchers (DataLoader-style), and keep-alive connection pools. It includes small, reusable wrappers to measure call latency and record success/error durations. It also provides pagination helpers to stream results efficiently and reduce memory pressure.

When to use it

When API responses are consistently slow or show high P95/P99 latency
When your integration issues many similar requests that can be cached
When you can benefit from batching many small requests into fewer API calls
When connection limits or socket churn are causing failures or slowdowns
When listing large collections and you need memory-efficient pagination

Best practices

Establish a baseline with measured latency metrics before changing behavior
Prefer a small in-memory LRU cache for low-latency hot reads; use Redis for distributed services
Use a short TTL and updateAgeOnGet or stale-while-revalidate to avoid cache stampedes
Batch requests with a DataLoader pattern and cap maxBatchSize and scheduling delay
Enable HTTP keep-alive with a tuned agent (maxSockets, maxFreeSockets, timeouts)
Instrument every critical call with timing and error logging to validate improvements

Example use cases

API gateway for a product that frequently reads the same Perplexity responses — add LRU or Redis caching
Microservice receiving many concurrent item lookups — use DataLoader batching to reduce outgoing calls
Backend job processing large lists — use an async generator pagination loop to stream results
High-traffic server experiencing socket exhaustion — configure keep-alive pooling and maxSockets
Feature rollout where latency impact must be measured — wrap calls with a measuredPerplexityCall helper

FAQ

How long should cache TTLs be?

Choose short TTLs (tens of seconds to a few minutes) for dynamic data; use stale-while-revalidate for slightly longer apparent freshness without bursts of misses.

What batch size is safe?

Start with a conservative maxBatchSize (50–100) and monitor latency and error rates, then adjust downward if batching causes timeouts.

When should I use Redis vs in-memory cache?

Use in-memory for single-process low-latency needs; use Redis for multiple instances, process restarts, or larger cache capacity.