home / skills / bdambrosio / cognitive_workbench / mc-training-capture

This skill enables or disables automatic Minecraft training data capture, recording partial observation grids and optional ground truth for model training.

npx playbooks add skill bdambrosio/cognitive_workbench --skill mc-training-capture

Review the files below or copy the command above to add this skill to your agents.

Files (2)
Skill.md
2.8 KB
---
name: mc-training-capture
type: python
description: "Enable or disable automatic training data capture (partial observation grid + optional ground truth)."
schema_hint:
  enable: "bool - Enable (true) or disable (false) capture (default: true)"
  sample_interval: "int - Capture every N position changes (default: 5)"
  include_ground_truth: "bool - Fetch ground truth from bridge (default: false, slower)"
  minecraft_url: "str - Bridge URL for ground truth (optional)"
examples:
  - '{"type":"mc-training-capture","enable":true,"sample_interval":5}'
  - '{"type":"mc-training-capture","enable":true,"include_ground_truth":true}'
  - '{"type":"mc-training-capture","enable":false}'
---

# Minecraft Training Data Capture Tool

Enable or disable automatic capture of training samples for perception model training.

## Purpose

Captures paired partial-observation inputs and ground truth derived from Minecraft world state:
- **Partial Observation Grid**: Current `local_grid` state (radius 10, 21×21×21 voxels)
- **Ground Truth Grid**: Full observable grid (radius 12, 25×25×25 voxels) - optional, requires bridge endpoint
- **Agent Context**: Pose, movement state, inventory summary

## Input

- `enable`: Boolean - Enable (`true`) or disable (`false`) capture (default: `true`)
- `sample_interval`: Integer - Capture sample every N position changes (default: `5`)
- `include_ground_truth`: Boolean - If `true`, fetch ground truth from bridge `/ground-truth` endpoint (default: `false`, slower)
- `minecraft_url`: String - Bridge URL for ground truth (optional, defaults to `MINECRAFT_URL` env var)

## Output

Returns uniform return format with:
- `status`: `"success"` or `"failed"`
- `value`: Human-readable status message
- `extra.enabled`: Boolean indicating if capture is enabled
- `extra.sample_interval`: Capture interval (if enabled)
- `extra.include_ground_truth`: Whether ground truth is included (if enabled)

## Storage

Training samples are saved to:
- **Disk**: `scenarios/minecraft/training_data/sample_{timestamp_ms}.json.zst` (Zstd compressed)
- **Memory**: Last 10 samples in `executor.world_state["training_samples"]`

## Usage

**Enable capture (partial grid only, faster):**
```json
{"type": "mc-training-capture", "enable": true, "sample_interval": 5}
```

**Enable capture with ground truth (slower, requires bridge endpoint):**
```json
{"type": "mc-training-capture", "enable": true, "include_ground_truth": true, "sample_interval": 10}
```

**Disable capture:**
```json
{"type": "mc-training-capture", "enable": false}
```

## Notes

- Capture is **disabled by default** - must be explicitly enabled
- Samples are captured automatically on position changes (when agent moves)
- Ground truth capture requires bridge `/ground-truth` endpoint (Phase 2)
- Samples are compressed with Zstd if available, otherwise stored as JSON
- Directory is created automatically if it doesn't exist

Overview

This skill enables or disables automatic capture of training samples from a Minecraft agent for perception model training. It records a partial-observation voxel grid plus optional ground-truth grids and agent context, saving samples to disk and keeping recent ones in memory. Use it to collect reproducible training data while the agent moves in the world.

How this skill works

When enabled, the skill listens for agent position changes and captures a sample every N moves (sample_interval). Each sample includes the local partial-observation grid (21×21×21 voxels, radius 10), agent pose, movement state, and inventory summary. Optionally it requests a full ground-truth grid (25×25×25, radius 12) from a bridge endpoint, which is slower and requires a configured Minecraft bridge URL. Samples are compressed to disk and the last 10 are kept in memory.

When to use it

  • Gather training data while an agent explores the world.
  • Collect partial observations-only data for fast sampling.
  • Include ground truth when precise labels are required and bridge access is available.
  • Temporarily disable capture to stop I/O during debugging or performance testing.
  • Store small recent buffer in memory for immediate inspection or replay.

Best practices

  • Enable capture explicitly; it is disabled by default to avoid unintended data collection.
  • Use a small sample_interval (e.g., 5) to balance dataset size and variety.
  • Turn on include_ground_truth only when the bridge /ground-truth endpoint is reachable and extra latency is acceptable.
  • Ensure MINECRAFT_URL or minecraft_url is set correctly for ground-truth fetching.
  • Monitor disk usage and archive or prune older samples; files are Zstd-compressed when available.

Example use cases

  • Enable fast partial-observation capture during large-scale exploration runs to build perception datasets.
  • Enable ground-truth capture for supervised training and offline evaluation when bridge access is available.
  • Disable capture before running performance-sensitive experiments to eliminate background I/O.
  • Set a larger sample_interval for sparser datasets or smaller interval for dense sampling.
  • Inspect the last 10 samples in memory for quick validation without reading disk files.

FAQ

What happens if the bridge URL is not set but include_ground_truth is true?

Ground-truth requests will fail; set minecraft_url or MINECRAFT_URL to a reachable bridge before enabling ground-truth capture.

Where are samples stored and how are they compressed?

Samples save to scenarios/minecraft/training_data as sample_{timestamp_ms}.json.zst when Zstd is available; otherwise plain JSON is written. The last 10 samples are kept in executor.world_state["training_samples"].