home / skills / bdambrosio / cognitive_workbench / pluck

pluck skill

/src/primitives/pluck

This skill extracts a single field from each dict Note in a collection, filtering out notes missing the field.

npx playbooks add skill bdambrosio/cognitive_workbench --skill pluck

Review the files below or copy the command above to add this skill to your agents.

Files (1)
Skill.md
2.0 KB
---
name: pluck
type: primitive
description: Extract single field value from each Note in Collection
---

# Pluck

## INPUT CONTRACT

- `target`: Collection (variable or ID)
- `field`: String path (supports dot notation like `metadata.uri`)
- `out`: Variable name

**REQUIREMENTS:**
- Collection MUST contain Notes (not Collections)
- Each Note MUST be dict/JSON object
- Field MUST exist as key in each Note

**NOT SUPPORTED:**
- ❌ Note containing array (use `split` first)
- ❌ Collection of arrays (must be dict Notes)

## OUTPUT

Returns Collection of Notes, each containing extracted scalar value. Notes missing field are excluded.

## FAILURE SEMANTICS

**Empty Collection = expected when:**
- Field missing in all Notes
- Type contract violated

**Empty ≠ error** — indicates no matches, not failure.

**Actual failures:** Invalid target type or missing parameters.

## REPRESENTATION INVARIANTS

- Note containing JSON array ≠ Collection
- Use `split` to convert array → Collection
- `flatten` performs inverse (Collection → Note)

## CONTENT STRUCTURE

**For JSON Notes, content is a dict with fields:**
- Top-level fields: `text`, `format`, `char_count`
- Nested fields: `metadata.*` (e.g., `metadata.uri`, `metadata.title`, `metadata.year`)

**Example Note content structure (from semantic-scholar/search-web):**
```json
{
  "text": "Full text content...",
  "format": "paper",
  "metadata": {
    "title": "Paper Title",
    "uri": "https://example.com/paper.pdf"
  },
  "char_count": 5000
}
```

## FIELD ACCESS EXAMPLES

**Extract nested field:**
```json
{"type":"pluck","target":"$papers","field":"metadata.title","out":"$titles"}
```

**Extract top-level field:**
```json
{"type":"pluck","target":"$results","field":"text","out":"$texts"}
```

**Extract URI for fetching:**
```json
{"type":"pluck","target":"$search_results","field":"metadata.uri","out":"$urls"}
```

## ANTI-PATTERNS

❌ `pluck(target=$array_note)` → Use `split` first
❌ `pluck(target=$coll_of_arrays)` → Elements must be dicts
❌ Treating empty result as error → Empty = no matches

Overview

This skill extracts a single scalar field value from each Note in a Collection and returns a new Collection of Notes containing those values. It supports dot-notation paths for nested fields (for example, metadata.uri). The result excludes Notes that do not contain the requested field.

How this skill works

Given a target Collection of JSON/dict Notes and a field path string, the skill iterates each Note and reads the specified key. If the key exists and its value is a scalar, the skill emits a new Note containing that value under the requested output variable. Notes that lack the field or violate the type contract are skipped; an empty output simply indicates no matches.

When to use it

  • You need a single top-level or nested value from each Note (e.g., titles, URIs, timestamps).
  • You want to transform a Collection of record objects into a Collection of scalar values for downstream processing.
  • You need to produce a list of fetchable URLs or identifiers for subsequent tasks.
  • You must filter out Notes that don’t contain a specific field without raising an error.

Best practices

  • Ensure the target is a Collection of dict/JSON Notes (use split to convert array Notes into a Collection first).
  • Specify nested fields using dot notation (metadata.uri, metadata.title).
  • Confirm the field exists in each Note; missing fields are silently excluded rather than causing failure.
  • Avoid using pluck on Notes whose field values are arrays — split arrays into separate Notes before plucking.
  • Treat an empty result as a valid outcome indicating no matches, not as an execution error.

Example use cases

  • Extract metadata.title from a Collection of paper Notes to build a title index.
  • Pluck metadata.uri values to create a Collection of URLs for a web fetcher or downloader.
  • Retrieve text fields from search results to feed into a summarization pipeline.
  • Collect char_count values from document Notes for size-based filtering or batching.

FAQ

What happens if some Notes don’t have the requested field?

Those Notes are excluded from the output. The output can be empty if no Notes contain the field; this is expected behavior, not an error.

Can I pluck an array field directly?

No. Pluck requires scalar values. If a Note contains an array, use split to convert the array into a Collection of Notes first, then pluck the scalar element from each resulting Note.