home / skills / bdambrosio / cognitive_workbench / as-markdown

This skill extracts and validates markdown structure from mixed text, enabling clean, usable content with headers, links, and code.

npx playbooks add skill bdambrosio/cognitive_workbench --skill as-markdown

Review the files below or copy the command above to add this skill to your agents.

Files (1)
Skill.md
2.5 KB
---
name: as-markdown
description: Extract markdown document structure and content from mixed/embedded text (NOT for converting plain text TO markdown)
type: prompt_augmentation
examples:
  - '{"type":"as-markdown","target":"$llm_response","element":"headers","out":"$toc"}'
---

# As Markdown

Interpret note content as markdown, automatically identifying markdown within mixed content and extracting structural elements.

## Purpose

- Extract markdown from LLM responses with noise
- Parse document structure (headers, lists, links, code)
- Extract content under specific sections
- Strip surrounding non-markdown text
- Validate markdown present

## Input Format

Accepts a Note containing markdown (clean or embedded):
- Clean markdown
- Markdown with surrounding text
- Code-fenced markdown blocks
- Mixed format responses

## Parameters

**Optional:**
- **element** - Element type: "headers", "lists", "links", "code", "section", or "all" (default)
- **section** - When element="section", specify header name to extract content under
- **all** - If true, returns array of all matching elements; if false/omitted, returns first match (default: false)

## Output Format

Returns extracted elements:
- **headers**: List of headers with levels
- **lists**: All list items (ordered and unordered)
- **links (all=false)**: First link found; (all=true): All links
- **code (all=false)**: First code block; (all=true): All code blocks
- **section**: Content under specified header
- **all**: Structured breakdown of all elements
- **Element not found**: note-null
- **Markdown not identified**: FAIL (hard error)

## Usage Examples

**Extract all structure:**
```json
{"type":"as-markdown","target":"$llm_response","out":"$structure"}
```

**Extract headers for TOC:**
```json
{"type":"as-markdown","target":"$doc","element":"headers","out":"$toc"}
```

**Extract specific section:**
```json
{"type":"as-markdown","target":"$doc","element":"section","section":"Usage","out":"$usage"}
```

**Extract first code block:**
```json
{"type":"as-markdown","target":"$response","element":"code","out":"$code"}
```

**Extract all code blocks:**
```json
{"type":"as-markdown","target":"$response","element":"code","all":true,"out":"$all_code"}
```

## Guidelines

- Strips leading/trailing non-markdown text automatically
- Element not found returns note-null (soft failure)
- No markdown identified triggers FAIL (hard failure)
- Preserves nesting for lists
- Code blocks include language identifier if present
- Default extracts first match; use `all: true` for all matches (applies to links, code)

Overview

This skill extracts markdown structure and content from mixed or embedded text, not for converting plain text into markdown. It detects headers, lists, links, code blocks, and whole sections within noisy or wrapped content. The skill returns structured elements or a full breakdown and fails only when no markdown is identified. It is designed for automated pipelines that need reliable markdown extraction from LLM responses or aggregated notes.

How this skill works

Provide a note that may contain clean markdown, code-fenced blocks, or markdown embedded in surrounding prose. The skill scans the input, strips non-markdown noise, identifies structural elements (header levels, nested lists, links, code blocks) and returns the requested element type or a complete structure. Options allow extracting the first match or all matches and extracting the content under a specific header. If no markdown is found it returns a hard error; if an element is missing it returns note-null.

When to use it

  • Cleaning and extracting markdown from LLM responses that include surrounding commentary or channel noise.
  • Generating a table of contents by extracting headers and levels from a mixed document.
  • Pulling a specific section (e.g., Usage or Installation) from a long note without manual parsing.
  • Collecting all code samples or the first code block from a reply that contains multiple snippets.

Best practices

  • Specify element explicitly (headers, lists, links, code, section, or all) to limit output and parsing work.
  • Use all: true when you expect multiple links or code blocks and want every match.
  • When extracting a section, match the header name exactly or standardize header names beforehand.
  • Feed the raw LLM response or full note so the skill can strip noise; avoid pre-trimming markdown unless necessary.
  • Treat note-null as a soft missing result and handle FAIL as a signal to re-evaluate input formatting.

Example use cases

  • Extract headers from a multi-part LLM reply to build a clickable table of contents.
  • Retrieve the Installation section from a generated project README that contains extra commentary.
  • Collect all code blocks from a tutorial response to run automated tests on examples.
  • Find the first link in a mixed-format answer for quick reference extraction.
  • Strip preamble and trailing chat metadata to recover the underlying markdown document.

FAQ

What happens if no markdown exists in the input?

The skill returns a hard error labeled FAIL to indicate no markdown was identified.

How are multiple matches returned?

Set all: true for elements that support multiple results (links, code); otherwise the first match is returned.

How does section extraction work?

When element=section, provide the header name. The skill extracts content under that header, preserving nested structure until the next header of the same or higher level.