home / skills / jeremylongshore / claude-code-plugins-plus-skills / exa-performance-tuning

This skill helps you speed up Exa API calls by applying caching, batching, and connection pooling to boost latency and throughput.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill exa-performance-tuning

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.9 KB
---
name: exa-performance-tuning
description: |
  Optimize Exa API performance with caching, batching, and connection pooling.
  Use when experiencing slow API responses, implementing caching strategies,
  or optimizing request throughput for Exa integrations.
  Trigger with phrases like "exa performance", "optimize exa",
  "exa latency", "exa caching", "exa slow", "exa batch".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Exa Performance Tuning

## Overview
Optimize Exa API performance with caching, batching, and connection pooling.

## Prerequisites
- Exa SDK installed
- Understanding of async patterns
- Redis or in-memory cache available (optional)
- Performance monitoring in place

## Latency Benchmarks

| Operation | P50 | P95 | P99 |
|-----------|-----|-----|-----|
| Read | 50ms | 150ms | 300ms |
| Write | 100ms | 250ms | 500ms |
| List | 75ms | 200ms | 400ms |

## Caching Strategy

### Response Caching
```typescript
import { LRUCache } from 'lru-cache';

const cache = new LRUCache<string, any>({
  max: 1000,
  ttl: 60000, // 1 minute
  updateAgeOnGet: true,
});

async function cachedExaRequest<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttl?: number
): Promise<T> {
  const cached = cache.get(key);
  if (cached) return cached as T;

  const result = await fetcher();
  cache.set(key, result, { ttl });
  return result;
}
```

### Redis Caching (Distributed)
```typescript
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

async function cachedWithRedis<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttlSeconds = 60
): Promise<T> {
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const result = await fetcher();
  await redis.setex(key, ttlSeconds, JSON.stringify(result));
  return result;
}
```

## Request Batching

```typescript
import DataLoader from 'dataloader';

const exaLoader = new DataLoader<string, any>(
  async (ids) => {
    // Batch fetch from Exa
    const results = await exaClient.batchGet(ids);
    return ids.map(id => results.find(r => r.id === id) || null);
  },
  {
    maxBatchSize: 100,
    batchScheduleFn: callback => setTimeout(callback, 10),
  }
);

// Usage - automatically batched
const [item1, item2, item3] = await Promise.all([
  exaLoader.load('id-1'),
  exaLoader.load('id-2'),
  exaLoader.load('id-3'),
]);
```

## Connection Optimization

```typescript
import { Agent } from 'https';

// Keep-alive connection pooling
const agent = new Agent({
  keepAlive: true,
  maxSockets: 10,
  maxFreeSockets: 5,
  timeout: 30000,
});

const client = new ExaClient({
  apiKey: process.env.EXA_API_KEY!,
  httpAgent: agent,
});
```

## Pagination Optimization

```typescript
async function* paginatedExaList<T>(
  fetcher: (cursor?: string) => Promise<{ data: T[]; nextCursor?: string }>
): AsyncGenerator<T> {
  let cursor: string | undefined;

  do {
    const { data, nextCursor } = await fetcher(cursor);
    for (const item of data) {
      yield item;
    }
    cursor = nextCursor;
  } while (cursor);
}

// Usage
for await (const item of paginatedExaList(cursor =>
  exaClient.list({ cursor, limit: 100 })
)) {
  await process(item);
}
```

## Performance Monitoring

```typescript
async function measuredExaCall<T>(
  operation: string,
  fn: () => Promise<T>
): Promise<T> {
  const start = performance.now();
  try {
    const result = await fn();
    const duration = performance.now() - start;
    console.log({ operation, duration, status: 'success' });
    return result;
  } catch (error) {
    const duration = performance.now() - start;
    console.error({ operation, duration, status: 'error', error });
    throw error;
  }
}
```

## Instructions

### Step 1: Establish Baseline
Measure current latency for critical Exa operations.

### Step 2: Implement Caching
Add response caching for frequently accessed data.

### Step 3: Enable Batching
Use DataLoader or similar for automatic request batching.

### Step 4: Optimize Connections
Configure connection pooling with keep-alive.

## Output
- Reduced API latency
- Caching layer implemented
- Request batching enabled
- Connection pooling configured

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cache miss storm | TTL expired | Use stale-while-revalidate |
| Batch timeout | Too many items | Reduce batch size |
| Connection exhausted | No pooling | Configure max sockets |
| Memory pressure | Cache too large | Set max cache entries |

## Examples

### Quick Performance Wrapper
```typescript
const withPerformance = <T>(name: string, fn: () => Promise<T>) =>
  measuredExaCall(name, () =>
    cachedExaRequest(`cache:${name}`, fn)
  );
```

## Resources
- [Exa Performance Guide](https://docs.exa.com/performance)
- [DataLoader Documentation](https://github.com/graphql/dataloader)
- [LRU Cache Documentation](https://github.com/isaacs/node-lru-cache)

## Next Steps
For cost optimization, see `exa-cost-tuning`.

Overview

This skill optimizes Exa API performance using caching, batching, and connection pooling to reduce latency and increase throughput. It provides pragmatic patterns for response caching (in-memory and Redis), request batching with DataLoader, connection keep-alive, and paginated streaming. Use it to establish a performance baseline and apply targeted optimizations for Exa integrations.

How this skill works

The skill measures baseline latencies for common Exa operations and applies three core techniques: response caching to avoid repeated fetches, request batching to combine many small requests into fewer calls, and HTTP connection pooling to reuse sockets. It also provides pagination generators and simple instrumentation to track operation durations and errors. Configuration examples cover in-memory LRU caches, Redis-backed caching, DataLoader batching, and keep-alive agents.

When to use it

  • When Exa API responses are consistently slow or exhibit high p95/p99 latencies
  • When many clients request the same data and caching can reduce load
  • When your integration issues many small requests that can be batched
  • When connections are frequently opened/closed and socket reuse would help
  • When you need to stream large lists efficiently with pagination

Best practices

  • Establish a latency baseline before changing code to quantify impact
  • Cache only idempotent or read-heavy responses and set conservative TTLs
  • Use stale-while-revalidate or short TTLs to avoid cache-miss storms
  • Limit batch sizes and schedule a short micro-batching delay (e.g., 5–20ms)
  • Configure keep-alive agents with sensible maxSockets and timeouts
  • Instrument calls to capture duration and error metadata for regressions

Example use cases

  • Reduce read latency for frequently requested Exa objects by adding an LRU or Redis cache
  • Aggregate user-driven fetches into batched requests via DataLoader to lower request count
  • Improve throughput under load by enabling HTTP keep-alive and tuning max sockets
  • Stream large result sets with an async pagination generator to reduce memory pressure
  • Wrap critical calls with a measuredExaCall helper to log latencies and errors for monitoring

FAQ

Will caching cause stale results?

Yes — caching trades freshness for speed. Use short TTLs, cache invalidation on writes, or stale-while-revalidate patterns for acceptable staleness windows.

How do I choose between in-memory and Redis caching?

Use in-memory LRU for single-process apps with low memory needs. Use Redis for multi-process or multi-region deployments that require a shared distributed cache.

What batch size should I use?

Start with conservative limits (e.g., 50–100) and measure. Reduce batch size if latency or timeouts increase, and use a small batching delay to collect requests.