home / skills / arustydev / ai / lang-rust-benchmarking-eng

lang-rust-benchmarking-eng skill

/components/skills/lang-rust-benchmarking-eng

This skill runs and analyzes cargo xtask bench benchmarks, generates Markdown reports, and compares results against the serde_json baseline.

npx playbooks add skill arustydev/ai --skill lang-rust-benchmarking-eng

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
10.3 KB
---
name: benchmarking
description: Run and manage performance benchmarks with cargo xtask bench for facet-json, analyzing results with Markdown reports and comparing against serde_json baseline
---

# Benchmarking with cargo xtask bench

The facet project uses a sophisticated benchmarking system that generates Markdown reports comparing performance across multiple targets.

## Quick Reference - Running Specific Benchmarks

```bash
# Run specific benchmark by name
cargo bench --bench unified_benchmarks_divan -- flatten_2enums

# Run with Tier-2 diagnostics
FACET_TIER2_DIAG=1 cargo bench --bench unified_benchmarks_divan -- flatten_2enums 2>&1 | grep TIER_DIAG

# Check tier2 statistics (attempts/successes/fallbacks)
cargo bench --bench unified_benchmarks_divan -- flatten_2enums 2>&1 | grep TIER_STATS

# Run all benchmarks matching a pattern
cargo bench --bench unified_benchmarks_divan -- flatten

# Run Tier-2 JIT benchmarks only
cargo bench --bench unified_benchmarks_divan -- "tier2"

# List available benchmarks
cargo bench --bench unified_benchmarks_divan -- --list | grep -v "    " | head -20
```

**⚠️  IMPORTANT:** Benchmark `.rs` files are GENERATED from `facet-json/benches/benchmarks.kdl`.
**DO NOT** edit `unified_benchmarks_*.rs` directly - edit `benchmarks.kdl` instead.

## Quick Usage

```bash
# Run all benchmarks and generate HTML + Markdown report
cargo xtask bench --index --serve

# Run benchmarks without the full perf index (faster)
cargo xtask bench

# Re-analyze existing benchmark data without re-running
cargo xtask bench --no-run

# Run only specific benchmarks (filter passed to cargo bench)
cargo xtask bench --index booleans

# Just generate reports from latest data
cargo xtask bench --no-run --index --serve
```

## How It Works

The benchmarking system has three main components:

### 1. Benchmark Definition (`benchmarks.kdl`)

Benchmarks are defined in `facet-json/benches/benchmarks.kdl` using KDL syntax:

```kdl
benchmark name="simple_struct" type="SimpleRecord" category="micro" {
    json "{\"id\": 42, \"name\": \"test\", \"active\": true}"
}

benchmark name="booleans" type="Vec<bool>" category="synthetic" {
    generated "booleans"
}

type_def name="SimpleRecord" {
    code """
#[derive(Debug, PartialEq, Facet, serde::Serialize, serde::Deserialize, Clone)]
struct SimpleRecord {
    id: u64,
    name: String,
    active: bool,
}
"""
}
```

**Categories**: `micro`, `synthetic`, `realistic`, `other`
**Data sources**: `json` (inline), `json_file`, `json_brotli`, `generated`

### 2. Benchmark Generation (`cargo xtask gen-benchmarks`)

Run this after editing `benchmarks.kdl`:

```bash
cargo xtask gen-benchmarks
```

This generates three files in `facet-json/`:
- `benches/unified_benchmarks_divan.rs` - Wall-clock timing benchmarks
- `benches/unified_benchmarks_gungraun.rs` - Instruction count benchmarks
- `tests/generated_benchmark_tests.rs` - Test versions for valgrind debugging

**Every benchmark gets all 4 targets automatically:**
- `serde_json` - Baseline (serde_json crate)
- `facet_format_json` - facet-format-json without JIT (reflection only)
- `facet_format_jit_t1` - Tier-1 JIT (shape-based, ParseEvent stream)
- `facet_format_jit_t2` - Tier-2 JIT (format-specific, direct byte parsing)

### 3. Benchmark Execution and Analysis

`cargo xtask bench` does:
1. Runs `unified_benchmarks_divan` (wall-clock times via divan)
2. Runs `unified_benchmarks_gungraun` (instruction counts via gungraun + valgrind)
3. Parses output and combines results
4. Generates multiple report formats:
   - `bench-reports/run.json` - Full structured data (schema: run-v1)
   - `bench-reports/perf/RESULTS.md` - **Markdown report for LLMs and humans**
   - `bench-reports/perf-data.json` - Legacy format for perf tracking

## The Markdown Report (`perf/RESULTS.md`)

Located at `bench-reports/perf/RESULTS.md`, this is the **authoritative source** for performance analysis:

**Structure:**
- **Targets table** - Definitions of all benchmark targets
- **Benchmark sections** - Grouped by category (Micro, Synthetic, Realistic)
- **Per-benchmark tables** - Deserialize and Serialize results
  - Columns: Target, Time (median), Instructions, vs serde_json ratio
  - Ratios: `**0.84×** ✓` (wins), `1.03×` (close), `3.12× ⚠` (needs work)
- **Summary** - Auto-categorized by performance:
  - Wins: ≤1.0× vs serde_json
  - Close: ≤1.5× vs serde_json
  - Needs Work: >1.5× vs serde_json

**Example:**
```markdown
### booleans

**Deserialize:**

| Target | Time (median) | Instructions | vs serde_json |
|--------|---------------|--------------|---------------|
| serde_json | 56.21µs | 1,157,922 | 1.00× |
| format+jit2 | 53.46µs | 972,221 | **0.84×** ✓ |
| format+jit1 | 809.30µs | 7,031,459 | 6.07× ⚠ |
| format | 2.94ms | 23,169,951 | 20.01× ⚠ |
```

## Adding New Benchmarks

1. **Edit `facet-json/benches/benchmarks.kdl`**
   ```kdl
   benchmark name="my_bench" type="MyType" category="synthetic" {
       generated "my_generator"
   }

   type_def name="MyType" {
       code """
   #[derive(Debug, Facet, serde::Serialize, serde::Deserialize, Clone)]
   struct MyType {
       field: String,
   }
   """
   }
   ```

2. **If using `generated`, add generator to `tools/benchmark-generator/src/main.rs`**
   - Edit `generate_json_data()` function
   - Add case for your generator name

3. **Regenerate benchmarks**
   ```bash
   cargo xtask gen-benchmarks
   ```

4. **Run benchmarks**
   ```bash
   cargo xtask bench --index --serve
   ```

## Important Flags

### `--no-run`
Skips running benchmarks, uses latest data. Useful for:
- Regenerating reports after fixing parser bugs
- Testing report generation changes
- Quick iterations on report formatting

### `--index`
Generates the full perf.facet.rs index:
- Clones the `facet-rs/perf.facet.rs` repo (gh-pages branch)
- Copies benchmark reports to `bench-reports/perf/`
- Generates index.html and supporting files
- **Required for viewing the interactive SPA**

### `--serve`
Starts a local server at `http://localhost:1999` to view reports.
Requires `--index`.

### `--push`
Pushes generated reports to the perf.facet.rs repo.
**Use with caution** - only for publishing official results.

## Debugging Benchmarks with Valgrind

The generated tests in `tests/generated_benchmark_tests.rs` mirror the benchmarks and can be run under valgrind:

```bash
# Run specific benchmark as test under valgrind
cargo nextest run --profile valgrind -p facet-json generated_benchmark_tests::test_booleans --features jit

# Or use the generated test filters
cargo nextest run --profile valgrind -p facet-json test_simple_struct --features jit
```

This is essential for debugging crashes or memory issues in benchmarks.

## Files and Directories

```
bench-reports/
├── divan-{timestamp}.txt          # Raw divan output
├── gungraun-{timestamp}.txt       # Raw gungraun output
├── run.json                       # Structured results (run-v1 schema)
├── perf-data.json                 # Legacy perf tracking format
└── perf/
    ├── RESULTS.md                 # **MAIN REPORT - READ THIS**
    ├── index.html                 # SPA (generated with --index)
    ├── app.js                     # SPA logic (copied from scripts/)
    └── shared-styles.css          # SPA styles (copied from scripts/)

facet-json/benches/
├── benchmarks.kdl                 # **EDIT THIS to add benchmarks**
├── unified_benchmarks_divan.rs    # Generated (divan)
└── unified_benchmarks_gungraun.rs # Generated (gungraun)

facet-json/tests/
└── generated_benchmark_tests.rs   # Generated (for valgrind)

tools/
├── benchmark-generator/           # KDL → Rust codegen
└── benchmark-analyzer/            # Output parsing + report generation
```

## Don't Edit Generated Files

❌ **NEVER edit these files** (they're regenerated):
- `unified_benchmarks_divan.rs`
- `unified_benchmarks_gungraun.rs`
- `generated_benchmark_tests.rs`
- `bench-reports/perf/index.html`, `app.js`, `shared-styles.css`

✅ **Edit these instead**:
- `facet-json/benches/benchmarks.kdl` - Benchmark definitions
- `tools/benchmark-generator/src/main.rs` - Generator logic (for `generated` benchmarks)
- `scripts/app.js`, `scripts/shared-styles.css` - SPA source (not the copies in perf/)

## Common Workflows

### Quick local benchmark run
```bash
cargo xtask bench
# Check bench-reports/perf/RESULTS.md
```

### Full interactive report
```bash
cargo xtask bench --index --serve
# Opens http://localhost:1999
```

### After editing benchmarks.kdl
```bash
cargo xtask gen-benchmarks
cargo xtask bench
```

### Re-analyze existing data
```bash
cargo xtask bench --no-run --index
```

### Benchmark a specific test
```bash
cargo xtask bench integers
# Only runs benchmarks matching "integers"
```

## Performance Analysis Tips

1. **Focus on the Markdown report first** (`perf/RESULTS.md`)
   - Easy to grep, parse, and read
   - Shows all critical metrics in one place
   - Auto-categorized by performance tier

2. **Use instruction counts, not just time**
   - More stable than wall-clock time
   - Architecture-independent
   - Appears in "vs serde_json" column when available

3. **Look for patterns in the Summary section**
   - "Needs Work" items are optimization targets
   - "Wins" validate current approach
   - "Close" items are low-hanging fruit

4. **Compare Tier-1 vs Tier-2 JIT**
   - Large gaps = Tier-2 not implemented or buggy
   - Similar performance = Tier-2 working but not optimized
   - Tier-2 wins = format-specific optimizations paying off

## Troubleshooting

### Benchmarks fail to compile
```bash
# Regenerate from KDL
cargo xtask gen-benchmarks
```

### Parser errors in output
- Check `bench-reports/divan-*.txt` or `gungraun-*.txt` for malformed output
- Fix the benchmark code, not the parser (usually)

### Missing benchmarks in report
- Ensure benchmark has `category` in `benchmarks.kdl`
- Check that `cargo xtask gen-benchmarks` ran successfully
- Verify benchmark functions are generated (check `unified_benchmarks_*.rs`)

### `--index` fails
- Ensure `gh` CLI is installed and authenticated
- Check network connection (clones from GitHub)
- Try `--index` without `--push` first

## See Also

- divan docs: https://docs.rs/divan/
- gungraun: Custom fork with valgrind integration
- Nextest valgrind profile: `.config/nextest.toml`
- Benchmark generator: `tools/benchmark-generator/`
- Report analyzer: `tools/benchmark-analyzer/`

Overview

This skill runs and manages performance benchmarks for facet-json using cargo xtask bench and related tools. It generates detailed Markdown reports that compare wall-clock times and instruction counts against a serde_json baseline. It also supports generating, serving, and publishing an interactive perf index for deeper analysis.

How this skill works

The workflow defines benchmarks in facet-json/benches/benchmarks.kdl, then generates Rust benchmark files via cargo xtask gen-benchmarks. cargo xtask bench runs divan (wall-clock) and gungraun (instruction counts), parses outputs, and emits structured run.json plus a human- and LLM-friendly perf/RESULTS.md. Optional flags control re-analysis, index generation, local serving, and publishing.

When to use it

  • Run routine performance regression checks after code changes
  • Add new benchmark cases to exercise specific formats or data shapes
  • Compare facet-json performance vs serde_json baseline
  • Generate reports for PR review or perf triage
  • Publish or preview the interactive perf index for stakeholders

Best practices

  • Edit benchmarks only in facet-json/benches/benchmarks.kdl; never modify generated Rust files
  • Regenerate benchmarks after KDL changes with cargo xtask gen-benchmarks
  • Prefer instruction counts for stable, architecture-independent comparisons
  • Use --no-run to iterate on report generation without re-running heavy benchmarks
  • Run --index and --serve locally to inspect the interactive SPA before publishing

Example use cases

  • Add a synthetic generator, update tools/benchmark-generator, run gen-benchmarks, then cargo xtask bench to measure impact
  • Quick local run: cargo xtask bench and inspect bench-reports/perf/RESULTS.md for summary and per-benchmark tables
  • Re-analyze existing data: cargo xtask bench --no-run --index to regenerate reports and the perf index without executing benchmarks
  • Debug a benchmark crash under valgrind using generated tests: cargo nextest run --profile valgrind -p facet-json generated_benchmark_tests::test_name --features jit
  • Publish official results: cargo xtask bench --index --push after verifying reports locally

FAQ

How do I add a new benchmark?

Add a benchmark and any new type definitions to facet-json/benches/benchmarks.kdl, add a generator case if needed in tools/benchmark-generator, run cargo xtask gen-benchmarks, then cargo xtask bench.

Why aren’t my changes reflected in unified_benchmarks_*.rs?

Those files are generated. After editing benchmarks.kdl you must run cargo xtask gen-benchmarks to regenerate the Rust benchmark sources.