home / skills / plaited / agent-eval-harness / headless-adapters

headless-adapters skill

safe

This skill helps you manage headless adapters by defining JSON schemas to drive CLI interactions, enabling schema-driven testing and easy integration.

npx playbooks add skill plaited/agent-eval-harness --skill headless-adapters

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

4.0 KB

---
name: headless-adapters
description: Discover, create, and validate headless adapters for agent integration. Includes scaffolding tools and schema-driven compliance testing.
compatibility: Bun >= 1.2.9
---

# Headless Adapters

## Purpose

Schema-driven adapter for headless CLI agents. **No code required** - just define a JSON schema describing how to interact with the CLI.

| Use Case | Tool |
|----------|------|
| Wrap headless CLI agent | `headless` command |
| Create new schemas | [Schema Creation Guide](references/schema-creation-guide.md) |

## Quick Start

1. **Check if a schema exists** in [schemas/](schemas/)
2. **Run the adapter:**
   ```bash
   ANTHROPIC_API_KEY=... bunx @plaited/agent-eval-harness headless --schema .claude/skills/headless-adapters/schemas/claude-headless.json
   ```

## CLI Commands

### headless

Schema-driven adapter for ANY headless CLI agent.

```bash
bunx @plaited/agent-eval-harness headless --schema <path>
```

**Options:**
| Flag | Description | Required |
|------|-------------|----------|
| `-s, --schema` | Path to adapter schema (JSON) | Yes |

**Schema Format:**

```json
{
  "version": 1,
  "name": "my-agent",
  "command": ["my-agent-cli"],
  "sessionMode": "stream",
  "prompt": { "flag": "-p" },
  "output": { "flag": "--output-format", "value": "stream-json" },
  "autoApprove": ["--allow-all"],
  "outputEvents": [
    {
      "match": { "path": "$.type", "value": "message" },
      "emitAs": "message",
      "extract": { "content": "$.text" }
    }
  ],
  "result": {
    "matchPath": "$.type",
    "matchValue": "result",
    "contentPath": "$.content"
  }
}
```

**Session Modes:**

| Mode | Description | Use When |
|------|-------------|----------|
| `stream` | Keep process alive, multi-turn via stdin | CLI supports session resume |
| `iterative` | New process per turn, accumulate history | CLI is stateless |

## Pre-built Schemas

Tested schemas are available in [schemas/](schemas/):

| Schema | Agent | Mode | Auth Env Var | Status |
|--------|-------|------|--------------|--------|
| `claude-headless.json` | Claude Code | stream | `ANTHROPIC_API_KEY` | Tested |
| `gemini-headless.json` | Gemini CLI | iterative | `GEMINI_API_KEY` | Tested |

**Usage:**
```bash
# Claude Code
ANTHROPIC_API_KEY=... bunx @plaited/agent-eval-harness headless --schema .claude/skills/headless-adapters/schemas/claude-headless.json

# Gemini CLI
GEMINI_API_KEY=... bunx @plaited/agent-eval-harness headless --schema .claude/skills/headless-adapters/schemas/gemini-headless.json
```

## Creating a Schema

1. Explore the CLI's `--help` to identify prompt, output, and auto-approve flags
2. Capture sample JSON output from the CLI
3. Map JSONPath patterns to output events
4. Create schema file based on an existing template
5. Test with `headless` command

See [Schema Creation Guide](references/schema-creation-guide.md) for the complete workflow.

## Troubleshooting

### Common Issues

| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| Tool calls not captured | JSONPath not iterating arrays | Use `[*]` wildcard syntax - [see guide](references/troubleshooting-guide.md#tool-calls-not-appearing) |
| "unexpected argument" error | Stdin mode misconfigured | Use `stdin: true` - [see guide](references/troubleshooting-guide.md#stdin-mode-issues) |
| 401 Authentication errors | API key not properly configured | Set the correct API key environment variable (see Pre-built Schemas table) |
| Timeout on prompt | JSONPath not matching | Capture raw CLI output, verify paths - [see guide](references/troubleshooting-guide.md#jsonpath-debugging) |
| Empty responses | Content extraction failing | Check extract paths - [see guide](references/troubleshooting-guide.md#output-event-matching) |

**Complete troubleshooting documentation:** [Troubleshooting Guide](references/troubleshooting-guide.md)

## External Resources

- **AgentSkills Spec**: [agentskills.io](https://agentskills.io)

## Related

- **[agent-eval-harness skill](../agent-eval-harness/SKILL.md)** - Running evaluations against adapters

Overview

This skill helps you discover, create, and validate headless adapters that connect CLI-based agents to an evaluation harness. It provides schema-driven scaffolding so you can wrap any headless CLI agent without writing glue code. Use the included pre-built schemas and validation tools to run repeatable, instrumented agent evaluations.

How this skill works

You define a JSON schema that describes how to invoke the CLI, where to send prompts, how to detect output events, and how to extract results. The headless adapter runs the CLI in either a persistent stream session or as iterative invocations, captures trajectory events using JSONPath rules, and emits standardized messages for evaluation and comparison. Built-in validators exercise the schema against sample CLI output and run pass@k and multi-run comparisons.

When to use it

You need to evaluate or compare CLI-first agents in a standardized harness.
You want to wrap a headless LLM or tool without writing custom integration code.
You need trajectory capture, pass@k metrics, or multi-run result comparison.
You have a CLI that outputs JSON and you want schema-driven event extraction.
You want to reuse tested schemas for agents like Claude or Gemini.

Best practices

Start from a tested schema template and adapt flags for prompt, output, and auth.
Capture raw CLI JSON output first, then map JSONPath patterns to events before validating.
Use stream mode for session-capable CLIs and iterative mode for stateless tools.
Include autoApprove flags and stdin settings explicitly to avoid interactive prompts.
Run schema validation against sample outputs and small multi-run tests before full evaluation.

Example use cases

Wrap Claude Code or Gemini CLI using provided schemas to run standardized evaluations.
Create a schema for a new headless agent by mapping prompt flag, output format, and event paths.
Capture full interaction trajectories for grading and calculating pass@k metrics.
Compare multiple agent runs across seeds or configs to measure consistency.
Diagnose integration issues by validating JSONPath extraction and session mode settings.

FAQ

What sessionMode should I use?

Use stream when the CLI supports a persistent process and multi-turn sessions; use iterative when the CLI is stateless and must be launched per turn.

How do I handle authentication?

Set the expected environment variable specified by the schema (for example ANTHROPIC_API_KEY or GEMINI_API_KEY) before running the adapter.