home / skills / screenpipe / screenpipe / screenpipe-search

screenpipe-search skill

/crates/screenpipe-core/assets/skills/screenpipe-search

This skill lets you search your locally recorded screen, audio, and UI events to quickly recall meetings, apps used, and moments.

npx playbooks add skill screenpipe/screenpipe --skill screenpipe-search

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.8 KB
---
name: screenpipe-search
description: Search the user's screen recordings, audio transcriptions, and UI interactions via the local Screenpipe API at localhost:3030. Use when the user asks about their screen activity, meetings, apps they used, what they saw/heard, or anything about their computer usage history.
---

# Screenpipe Search

Search the user's locally-recorded screen and audio data. Screenpipe continuously captures screen text (OCR), audio transcriptions, and UI events (clicks, keystrokes, app switches).

The API runs at `http://localhost:3030`.

## Search API

```bash
curl "http://localhost:3030/search?q=QUERY&content_type=all&limit=10&start_time=ISO8601&end_time=ISO8601"
```

### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `q` | string | No | Search keywords. Be specific. |
| `content_type` | string | No | `all` (default), `ocr`, `audio`, `vision`, `input` |
| `limit` | integer | No | Max results 1-20. Default: 10 |
| `offset` | integer | No | Pagination offset. Default: 0 |
| `start_time` | ISO 8601 | **Yes** | Start of time range. ALWAYS include this. |
| `end_time` | ISO 8601 | No | End of time range. Defaults to now. |
| `app_name` | string | No | Filter by app (e.g. "Google Chrome", "Slack", "zoom.us", "Code") |
| `window_name` | string | No | Filter by window title substring |
| `speaker_name` | string | No | Filter audio by speaker name (case-insensitive partial match) |
| `focused` | boolean | No | Only return results from focused windows |

### Content Types

- `vision` or `ocr` — Screen text captured via OCR
- `audio` — Audio transcriptions (meetings, voice)
- `input` — UI events: clicks, keystrokes, clipboard, app switches
- `all` — Everything (default)

### CRITICAL RULES

1. **ALWAYS include `start_time`** — the database has hundreds of thousands of entries. Queries without time bounds WILL timeout.
2. **Start with short time ranges** — default to last 1-2 hours. Only expand if no results found.
3. **Use `app_name` filter** whenever the user mentions a specific app.
4. **Keep `limit` low** (5-10) initially. Only increase if the user needs more.
5. **"recent"** = last 30 minutes. **"today"** = since midnight. **"yesterday"** = yesterday's date range.
6. If a search times out, retry with a narrower time range (e.g. 30 mins instead of 2 hours).

### Example Searches

```bash
# What happened in the last hour
curl "http://localhost:3030/search?content_type=all&limit=10&start_time=$(date -u -v-1H +%Y-%m-%dT%H:%M:%SZ)"

# Slack messages today
curl "http://localhost:3030/search?app_name=Slack&content_type=ocr&limit=10&start_time=$(date -u +%Y-%m-%dT00:00:00Z)"

# Audio transcriptions from meetings
curl "http://localhost:3030/search?content_type=audio&limit=5&start_time=$(date -u -v-4H +%Y-%m-%dT%H:%M:%SZ)"

# What a specific person said
curl "http://localhost:3030/search?content_type=audio&speaker_name=John&limit=10&start_time=$(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ)"

# Browser activity
curl "http://localhost:3030/search?app_name=Google%20Chrome&content_type=ocr&limit=10&start_time=$(date -u -v-2H +%Y-%m-%dT%H:%M:%SZ)"
```

### Response Format

```json
{
  "data": [
    {
      "type": "OCR",
      "content": {
        "frame_id": 12345,
        "text": "screen text captured...",
        "timestamp": "2024-01-15T10:30:00Z",
        "file_path": "/path/to/video.mp4",
        "offset_index": 42,
        "app_name": "Google Chrome",
        "window_name": "GitHub - screenpipe",
        "tags": [],
        "frame": null
      }
    },
    {
      "type": "Audio",
      "content": {
        "chunk_id": 678,
        "transcription": "what they said...",
        "timestamp": "2024-01-15T10:31:00Z",
        "file_path": "/path/to/audio.mp4",
        "offset_index": 5,
        "tags": [],
        "speaker": {
          "id": 1,
          "name": "John",
          "metadata": ""
        }
      }
    },
    {
      "type": "UI",
      "content": {
        "id": 999,
        "text": "Clicked button 'Submit'",
        "timestamp": "2024-01-15T10:32:00Z",
        "app_name": "Safari",
        "window_name": "Forms",
        "initial_traversal_at": null
      }
    }
  ],
  "pagination": {
    "limit": 10,
    "offset": 0,
    "total": 42
  }
}
```

## Fetching Frames (Screenshots)

You can fetch actual screenshot frames from search results. Each OCR result has a `frame_id`.

```bash
# Get a specific frame as an image
curl -o /tmp/frame.png "http://localhost:3030/frames/{frame_id}"
```

This returns the raw PNG image. Use the `read` tool to view it (pi supports images).

### When to fetch frames
- When the user asks "show me what I was looking at" or "what was on screen"
- When you need visual context to answer a question (e.g. UI layout, charts, design)
- When OCR text is ambiguous and you need to see the actual screen

### CRITICAL: Token budget for frames
- Each frame is ~1000-2000 tokens when sent to the LLM
- **NEVER fetch more than 2-3 frames per query** — it's expensive and slow
- Prefer using OCR text from search results first. Only fetch frames when text isn't enough.
- If the user asks about many moments, summarize from OCR text and only fetch 1-2 key frames.

### Example workflow
```bash
# 1. Search for relevant content
curl "http://localhost:3030/search?q=dashboard&app_name=Chrome&content_type=ocr&limit=5&start_time=2024-01-15T10:00:00Z"

# 2. Pick the most relevant frame_id from results
# 3. Fetch that specific frame
curl -o /tmp/frame_12345.png "http://localhost:3030/frames/12345"

# 4. Read/view the image
```

## Other Useful Endpoints

### Health Check
```bash
curl http://localhost:3030/health
```

### List Audio Devices
```bash
curl http://localhost:3030/audio/list
```

### List Monitors
```bash
curl http://localhost:3030/vision/list
```

### Raw SQL (advanced)
```bash
curl -X POST http://localhost:3030/raw_sql -H "Content-Type: application/json" -d '{"query": "SELECT COUNT(*) FROM ocr_text"}'
```

### Speakers
```bash
# Search speakers
curl "http://localhost:3030/speakers/search?name=John"

# List unnamed speakers
curl http://localhost:3030/speakers/unnamed
```

## Showing Videos

When referencing video files from search results, show the `file_path` to the user in an inline code block so it renders as a playable video:

```
`/Users/name/.screenpipe/data/monitor_1_2024-01-15_10-30-00.mp4`
```

Do NOT use markdown links or multi-line code blocks for videos.

## Tips

- The user's data is 100% local. You are querying their local machine.
- Timestamps in results are UTC. Convert to the user's local timezone when displaying.
- If asked "what did I work on today?", search with broad terms and short time ranges, then summarize by app/activity.
- If asked about meetings, use `content_type=audio`.
- If asked about a specific app, always use the `app_name` filter.
- Combine multiple searches to build a complete picture (e.g., screen + audio for a meeting).

Overview

This skill lets you search your locally recorded screen text, audio transcriptions, and UI events through the Screenpipe API running at http://localhost:3030. It helps you find what you saw, heard, or clicked on by querying OCR, audio, vision frames, and input events. All searches run locally against your personal recordings and respect strict time-range requirements to avoid timeouts.

How this skill works

The skill issues HTTP requests to the Screenpipe search endpoint, returning OCR text, audio transcripts, and UI event records. Queries must always include a start_time and should start with a narrow window (last 30–120 minutes). When needed, it fetches specific screenshot frames via frame_id, but prefers OCR/text first to conserve tokens.

When to use it

  • When you want to find text or images that appeared on your screen within a time range.
  • To locate what was said in a meeting or who spoke using audio transcriptions.
  • To audit recent UI interactions: clicks, keystrokes, app switches, or clipboard events.
  • When you need to reconstruct activity across apps (e.g., browser + Slack + code editor).
  • When you want a screenshot for visual confirmation after OCR results are ambiguous.

Best practices

  • Always include start_time in ISO 8601 format; default to the last 1–2 hours and expand only if needed.
  • Use app_name or window_name filters when the user mentions a specific app to narrow results and improve speed.
  • Keep limit low (5–10) on initial queries to avoid large responses and timeouts.
  • Treat “recent” as last 30 minutes, “today” as since midnight, and “yesterday” as the previous date range.
  • Fetch frames only when OCR or transcripts are insufficient; never fetch more than 2–3 frames per query due to token costs.

Example use cases

  • Find what you discussed in this morning’s 1-hour meeting by searching audio with start_time set to the meeting start.
  • Locate a browser tab’s text from two hours ago by searching OCR with app_name=Google Chrome and a 2-hour window.
  • Show what you clicked in a form by querying input events for a focused window and the relevant time range.
  • Confirm the content of a dashboard chart by searching OCR for keywords, then fetching one key frame for visual context.
  • List everything you worked on today by running short-range searches per app (Chrome, Slack, Code) and summarizing results.

FAQ

What happens if a search times out?

If a search times out, retry with a narrower time range (e.g., 30 minutes instead of 2 hours) or add app_name/window_name filters to reduce results.

Can I fetch video files from results?

Yes. Search results include file_path entries; present the file path inline so the user can open/play the recorded video.