home / skills / jeremylongshore / claude-code-plugins-plus-skills / vastai-performance-tuning

This skill improves Vast.ai API performance by applying caching, batching, and connection pooling to reduce latency and boost throughput.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill vastai-performance-tuning

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.0 KB
---
name: vastai-performance-tuning
description: |
  Optimize Vast.ai API performance with caching, batching, and connection pooling.
  Use when experiencing slow API responses, implementing caching strategies,
  or optimizing request throughput for Vast.ai integrations.
  Trigger with phrases like "vastai performance", "optimize vastai",
  "vastai latency", "vastai caching", "vastai slow", "vastai batch".
allowed-tools: Read, Write, Edit
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Vast.ai Performance Tuning

## Overview
Optimize Vast.ai API performance with caching, batching, and connection pooling.

## Prerequisites
- Vast.ai SDK installed
- Understanding of async patterns
- Redis or in-memory cache available (optional)
- Performance monitoring in place

## Latency Benchmarks

| Operation | P50 | P95 | P99 |
|-----------|-----|-----|-----|
| Read | 50ms | 150ms | 300ms |
| Write | 100ms | 250ms | 500ms |
| List | 75ms | 200ms | 400ms |

## Caching Strategy

### Response Caching
```typescript
import { LRUCache } from 'lru-cache';

const cache = new LRUCache<string, any>({
  max: 1000,
  ttl: 60000, // 1 minute
  updateAgeOnGet: true,
});

async function cachedVast.aiRequest<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttl?: number
): Promise<T> {
  const cached = cache.get(key);
  if (cached) return cached as T;

  const result = await fetcher();
  cache.set(key, result, { ttl });
  return result;
}
```

### Redis Caching (Distributed)
```typescript
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

async function cachedWithRedis<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttlSeconds = 60
): Promise<T> {
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const result = await fetcher();
  await redis.setex(key, ttlSeconds, JSON.stringify(result));
  return result;
}
```

## Request Batching

```typescript
import DataLoader from 'dataloader';

const vastaiLoader = new DataLoader<string, any>(
  async (ids) => {
    // Batch fetch from Vast.ai
    const results = await vastaiClient.batchGet(ids);
    return ids.map(id => results.find(r => r.id === id) || null);
  },
  {
    maxBatchSize: 100,
    batchScheduleFn: callback => setTimeout(callback, 10),
  }
);

// Usage - automatically batched
const [item1, item2, item3] = await Promise.all([
  vastaiLoader.load('id-1'),
  vastaiLoader.load('id-2'),
  vastaiLoader.load('id-3'),
]);
```

## Connection Optimization

```typescript
import { Agent } from 'https';

// Keep-alive connection pooling
const agent = new Agent({
  keepAlive: true,
  maxSockets: 10,
  maxFreeSockets: 5,
  timeout: 30000,
});

const client = new Vast.aiClient({
  apiKey: process.env.VASTAI_API_KEY!,
  httpAgent: agent,
});
```

## Pagination Optimization

```typescript
async function* paginatedVast.aiList<T>(
  fetcher: (cursor?: string) => Promise<{ data: T[]; nextCursor?: string }>
): AsyncGenerator<T> {
  let cursor: string | undefined;

  do {
    const { data, nextCursor } = await fetcher(cursor);
    for (const item of data) {
      yield item;
    }
    cursor = nextCursor;
  } while (cursor);
}

// Usage
for await (const item of paginatedVast.aiList(cursor =>
  vastaiClient.list({ cursor, limit: 100 })
)) {
  await process(item);
}
```

## Performance Monitoring

```typescript
async function measuredVast.aiCall<T>(
  operation: string,
  fn: () => Promise<T>
): Promise<T> {
  const start = performance.now();
  try {
    const result = await fn();
    const duration = performance.now() - start;
    console.log({ operation, duration, status: 'success' });
    return result;
  } catch (error) {
    const duration = performance.now() - start;
    console.error({ operation, duration, status: 'error', error });
    throw error;
  }
}
```

## Instructions

### Step 1: Establish Baseline
Measure current latency for critical Vast.ai operations.

### Step 2: Implement Caching
Add response caching for frequently accessed data.

### Step 3: Enable Batching
Use DataLoader or similar for automatic request batching.

### Step 4: Optimize Connections
Configure connection pooling with keep-alive.

## Output
- Reduced API latency
- Caching layer implemented
- Request batching enabled
- Connection pooling configured

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Cache miss storm | TTL expired | Use stale-while-revalidate |
| Batch timeout | Too many items | Reduce batch size |
| Connection exhausted | No pooling | Configure max sockets |
| Memory pressure | Cache too large | Set max cache entries |

## Examples

### Quick Performance Wrapper
```typescript
const withPerformance = <T>(name: string, fn: () => Promise<T>) =>
  measuredVast.aiCall(name, () =>
    cachedVast.aiRequest(`cache:${name}`, fn)
  );
```

## Resources
- [Vast.ai Performance Guide](https://docs.vastai.com/performance)
- [DataLoader Documentation](https://github.com/graphql/dataloader)
- [LRU Cache Documentation](https://github.com/isaacs/node-lru-cache)

## Next Steps
For cost optimization, see `vastai-cost-tuning`.

Overview

This skill helps optimize Vast.ai API integrations by applying caching, request batching, and connection pooling patterns. It provides practical strategies and code patterns to reduce latency, increase throughput, and stabilize production workloads when calling Vast.ai services.

How this skill works

The skill inspects common Vast.ai call patterns and introduces response caching (in-memory or Redis), automatic request batching using DataLoader-style loaders, and HTTP connection pooling with keep-alive agents. It includes pagination helpers and simple performance wrappers to measure and log call durations and errors so you can prioritize optimizations.

When to use it

  • When Vast.ai API responses are consistently slow or variable
  • When you need to reduce API call volume or throttle bursts
  • When you want to aggregate many small requests into fewer batch requests
  • When running distributed services that share cached results
  • When monitoring shows high connection churn or socket exhaustion

Best practices

  • Start by measuring baseline latency for read/write/list operations before changing code
  • Cache only idempotent or easily invalidated responses and set sensible TTLs
  • Use staggered TTLs or stale-while-revalidate to avoid cache stampedes
  • Limit batch size and short schedule windows to balance latency vs throughput
  • Enable HTTP keep-alive and tune max sockets to match expected concurrency
  • Instrument every change with timing and error logs to validate improvements

Example use cases

  • Cache frequent listing results (offers, images) with an LRU cache or Redis to cut P95 latency
  • Wrap many single-item lookups in a DataLoader to convert them into bulk batchGet calls
  • Use a keep-alive agent with tuned maxSockets to prevent connection exhaustion under load
  • Stream large result sets with an async generator that paginates efficiently and processes items incrementally
  • Layer a measured wrapper around critical calls to track regressions after deployments

FAQ

Should I use in-memory cache or Redis?

Use in-memory LRU for single-process services and Redis for multi-instance deployments that need a shared cache and persistence across restarts.

How do I avoid cache stampedes on TTL expiry?

Use stale-while-revalidate, jittered TTLs, or a single-flight pattern so only one request refreshes a key while others serve stale data.