home / skills / dagster-io / erk / session-inspector

session-inspector skill

Q: What optimizations are applied during preprocessing?

Preprocessing applies empty/warmup filtering, documentation deduplication, long tool parameter truncation, tool result pruning (first ~30 lines), and log discovery filtering to keep outputs readable and compact.

safe

/.claude/skills/session-inspector

This skill helps you inspect Claude Code session logs, analyze context usage, and extract plans and issues from sessions for debugging and transparency.

npx playbooks add skill dagster-io/erk --skill session-inspector

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

7.9 KB

---
name: session-inspector
description: >
  This skill should be used when inspecting, analyzing, or querying Claude Code session logs.
  Use when users ask about session history, want to find sessions, analyze context usage,
  extract tool call patterns, debug agent execution, or understand what happened in previous
  sessions. Essential for understanding Claude Code's ~/.claude/projects/ structure, JSONL
  session format, and the erk extraction pipeline.
---

# Session Inspector

## Overview

Session Inspector provides comprehensive tools for inspecting Claude Code session logs stored
in `~/.claude/projects/`. The skill enables:

- Discovering and listing sessions for any project/worktree
- Preprocessing sessions to readable XML format
- Analyzing context window consumption
- Extracting plans from sessions
- Creating GitHub issues from session content
- Debugging agent subprocess execution
- Understanding the two-stage extraction pipeline

## When to Use

Invoke this skill when users:

- Ask what sessions exist for a project or worktree
- Want to find a specific session by ID or content
- Need to analyze context window consumption
- Ask about tool call patterns or frequencies
- Need to debug agent subprocess failures
- Want to extract plans from sessions
- Ask about session history or previous conversations
- Need to understand session preprocessing or extraction

## Quick Reference: CLI Commands

All commands invoked via `erk exec <command>`:

| Command                      | Purpose                                          |
| ---------------------------- | ------------------------------------------------ |
| `list-sessions`              | List sessions with metadata for current worktree |
| `preprocess-session`         | Convert JSONL to compressed XML                  |
| `extract-latest-plan`        | Extract most recent plan from session            |
| `create-issue-from-session`  | Create GitHub issue from session plan            |
| `extract-session-from-issue` | Extract session content from GitHub issue        |

### Slash Commands

| Command                | Purpose                                      |
| ---------------------- | -------------------------------------------- |
| `/erk:sessions-list`   | Display formatted session list table         |
| `/erk:analyze-context` | Analyze context window usage across sessions |

## Core Capabilities

### 1. List Sessions

```bash
erk exec list-sessions [--limit N] [--min-size BYTES]
```

**Options:**

- `--limit`: Maximum sessions to return (default: 10)
- `--min-size`: Minimum file size in bytes to filter tiny sessions (default: 0)

**Output includes:**

- Branch context (current_branch, trunk_branch, is_on_trunk)
- Current session ID from SESSION_CONTEXT env var
- Sessions array with: session_id, mtime_display, mtime_relative, size_bytes, summary, is_current
- Project directory path and filtered count

### 2. Preprocess Session to XML

```bash
erk exec preprocess-session <log-path> [OPTIONS]
```

**Options:**

- `--session-id`: Filter entries to specific session ID
- `--include-agents/--no-include-agents`: Include agent logs (default: True)
- `--no-filtering`: Disable filtering optimizations
- `--stdout`: Output to stdout instead of temp file

**Optimizations applied:**

- Empty/warmup session filtering
- Documentation deduplication (hash markers)
- Tool parameter truncation (>200 chars)
- Tool result pruning (first 30 lines, preserves errors)
- Log discovery operation filtering

### 3. Extract Plan from Session

```bash
erk exec extract-latest-plan [--session-id SESSION_ID]
```

Extracts most recent plan from session. Uses session-scoped lookup via slug field,
falls back to mtime-based lookup if no session-specific plan found.

### 4. Create GitHub Issue from Session

```bash
erk exec create-issue-from-session [--session-id SESSION_ID]
```

Extracts plan and creates GitHub issue with session content. Returns JSON with
issue_number and issue_url.

### 5. Render Session for GitHub

```bash
erk exec render-session-content --session-file <path> [--session-label LABEL] [--extraction-hints HINTS]
```

Renders session XML as GitHub comment blocks with automatic chunking for large content.

### 6. Extract Session from GitHub Issue

```bash
erk exec extract-session-from-issue <issue-number> [--output PATH] [--session-id ID]
```

Extracts and combines chunked session content from GitHub issue comments.

## Directory Structure

```
~/.claude/projects/
├── -Users-foo-code-myapp/           ← Encoded project path
│   ├── abc123-def456.jsonl          ← Main session log
│   ├── xyz789-ghi012.jsonl          ← Another session
│   ├── agent-17cfd3f4.jsonl         ← Agent subprocess log
│   └── agent-2a3b4c5d.jsonl         ← Another agent log
```

**Path encoding:** Prepend `-`, replace `/` and `.` with `-`

Example: `/Users/foo/code/myapp` → `-Users-foo-code-myapp`

## Session ID

Session IDs are passed explicitly to CLI commands via `--session-id` options. The typical flow:

1. Hook receives session context via stdin JSON from Claude Code
2. Hook outputs `📌 session: <id>` reminder to conversation
3. Agent extracts session ID from reminder text
4. Agent passes session ID as explicit CLI parameter

**Example:**

```bash
erk exec list-sessions --session-id abc123-def456
```

## Two-Stage Extraction Pipeline

The extraction system uses a two-stage pipeline:

### Stage 1: Mechanical Reduction (Deterministic)

- Drop file-history-snapshot entries
- Strip usage metadata
- Remove empty text blocks
- Compact whitespace (3+ newlines → 1)
- Deduplicate assistant messages with tool_use
- Output: Compressed XML

### Stage 2: Haiku Distillation (Optional, Semantic)

- Remove noise (log discovery, warmup content)
- Deduplicate semantically similar blocks
- Prune verbose outputs
- Preserves errors, stack traces, warnings
- Output: Semantically refined XML

## Session Selection Logic

The `auto_select_sessions()` function uses intelligent rules:

- **On trunk:** Use current session only
- **Current session trivial (<1KB) + substantial sessions exist:** Auto-select substantial
- **Current session substantial (>=1KB):** Use it alone
- **No substantial sessions:** Return current even if trivial

## Scratch Storage

Session-scoped files stored in `.erk/scratch/sessions/<session-id>/`:

```python
from erk_shared.scratch import get_scratch_dir, write_scratch_file

scratch_dir = get_scratch_dir(session_id)
file_path = write_scratch_file(content, session_id=session_id, suffix=".xml")
```

## Common Tasks

### Find What Happened in a Session

1. List sessions: `erk exec list-sessions`
2. Find by summary or time
3. Preprocess: `erk exec preprocess-session <file> --stdout | head -500`

### Debug Context Blowout

1. Run `/erk:analyze-context`
2. Check token breakdown by category
3. Look for duplicate reads or large tool results

### Extract Plan for Implementation

```bash
erk exec extract-latest-plan --session-id <id>
```

### Create Issue from Session Plan

```bash
erk exec create-issue-from-session --session-id <id>
```

### Find Agent Subprocess Logs

```bash
PROJECT_DIR=$(erk find-project-dir | jq -r '.project_dir')
ls -lt "$PROJECT_DIR"/agent-*.jsonl | head -10
```

### Check for Errors in Agent

```bash
cat agent-<id>.jsonl | jq 'select(.message.is_error == true)'
```

## Resources

### references/

- `tools.md` - Complete CLI commands and jq analysis recipes
- `format.md` - JSONL format specification and entry types
- `extraction.md` - erk_shared extraction module API reference

Load references when users need detailed command syntax, format documentation, or
programmatic access to extraction capabilities.

## Code Dependencies

This skill documents capabilities that primarily live in:

- **CLI commands:** `packages/erk-cli/src/erk_cli/commands/`
- **Shared library:** `packages/erk-shared/src/erk_shared/extraction/`
- **GitHub metadata:** `packages/erk-shared/src/erk_shared/github/metadata.py`
- **Scratch storage:** `packages/erk-shared/src/erk_shared/scratch/`

Overview

This skill inspects, analyzes, and queries Claude Code session logs stored under ~/.claude/projects/. It helps you discover sessions, preprocess logs into readable XML, extract plans, analyze context-window usage, and debug agent subprocesses. Use it to turn raw JSONL session data into actionable insights and reproducible artifacts like GitHub issues.

How this skill works

The skill drives the erk CLI to list session metadata, preprocess JSONL logs into compressed XML, and run a two-stage extraction pipeline (mechanical reduction then optional semantic distillation). It reads encoded project directories, locates session and agent subprocess logs, computes context-token breakdowns, and extracts the most recent plan or the full rendered session for issue creation. Outputs include XML, summarized JSON, and GitHub issue metadata.

When to use it

You need to find what happened in a past Claude Code session or locate a session by ID
You want to analyze context-window consumption or identify token-heavy inputs
You need to extract the most recent plan from a session for implementation or reporting
You must debug agent subprocess failures or inspect agent-* logs
You want to publish or reconstruct session content as a GitHub issue or comment

Best practices

Start with erk exec list-sessions to narrow candidates before preprocessing large logs
Preprocess to compressed XML (--stdout for piping) to reduce noise before deep analysis
Use --session-id explicitly to avoid ambiguous session selection rules
Run the two-stage extraction: deterministic mechanical reduction first, then semantic distillation if you need human-readable summaries
Keep session-scoped scratch files under .erk/scratch/sessions/<session-id> for reproducibility

Example use cases

Locate a session by recent edit time and extract its plan for a PR or issue
Analyze context token breakdowns to diagnose context blowouts and trim redundant reads
Preprocess a large JSONL to XML and render it as chunked GitHub comments for long-session sharing
List agent subprocess logs to find and inspect errors or stack traces from helper agents
Create a GitHub issue automatically from an extracted session plan to track follow-up work

FAQ

How does session auto-selection work when no session-id is given?

auto_select_sessions uses trunk status and session sizes: on trunk it prefers the current session; trivial current sessions (<1KB) yield substantial sessions if present; otherwise it returns the current session.

What optimizations are applied during preprocessing?

Preprocessing applies empty/warmup filtering, documentation deduplication, long tool parameter truncation, tool result pruning (first ~30 lines), and log discovery filtering to keep outputs readable and compact.