sense skill

safe

This skill extracts structured knowledge from video lectures by parsing subtitles, OCR-ing diagrams, and mapping results to GF(3)-balanced skills.

npx playbooks add skill plurigrid/asi --skill sense

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

18.2 KB

---
name: sense
description: 'sense - Diagrammatic Video Extraction with Subtitle Alignment'
version: 1.0.0
---

# sense - Diagrammatic Video Extraction with Subtitle Alignment

> **Trit**: 0 (ERGODIC - Coordinator)
> 
> Extract structured knowledge from video lectures via subtitle parsing,
> diagram/equation OCR, and GF(3)-balanced skill mapping.

## Overview

`sense` transforms video lectures into indexed, queryable knowledge:

```
┌─────────────────────────────────────────────────────────────────┐
│                         VIDEO INPUT                              │
│  • Lecture recording (.mkv, .mp4)                               │
│  • Subtitles (.vtt, .srt, auto-generated)                       │
│  • Slides/diagrams (extracted frames)                           │
└──────────────────────────┬──────────────────────────────────────┘
                           │
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
    │  SUBTITLE   │ │  DIAGRAM    │ │   SKILL     │
    │  PARSER     │ │  EXTRACTOR  │ │   MAPPER    │
    │  (-1 BLUE)  │ │  (0 GREEN)  │ │  (+1 RED)   │
    └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
           │               │               │
           │         Mathpix OCR           │
           │         frame → LaTeX         │
           │                               │
    ┌──────▼───────────────▼───────────────▼──────┐
    │              DuckDB INDEX                    │
    │  • Timestamped transcript                    │
    │  • Extracted equations (LaTeX)               │
    │  • Skill mappings with GF(3) trits           │
    │  • Queryable views                           │
    └──────────────────────────────────────────────┘
```

## Triadic Structure

| Role | Component | Trit | Function |
|------|-----------|------|----------|
| **Validator** | Subtitle Parser | -1 | Parse VTT/SRT, segment by timestamp |
| **Coordinator** | Diagram Extractor | 0 | OCR frames → LaTeX via Mathpix |
| **Generator** | Skill Mapper | +1 | Assign skills with GF(3) balance |

**GF(3) Conservation**: (-1) + (0) + (+1) = 0 ✓

## Components

### 1. Subtitle Parser (-1)

Parses WebVTT or SRT subtitle files into structured segments:

```ruby
require 'webvtt'

class SubtitleParser
  def initialize(vtt_path)
    @vtt = WebVTT.read(vtt_path)
  end
  
  def segments
    @vtt.cues.map do |cue|
      {
        start: cue.start.total_seconds,
        end: cue.end.total_seconds,
        text: cue.text.gsub(/<[^>]*>/, '').strip,
        duration: cue.end.total_seconds - cue.start.total_seconds
      }
    end
  end
  
  def by_slide(slide_timestamps)
    # Group subtitles by slide boundaries
    slide_timestamps.map.with_index do |ts, i|
      next_ts = slide_timestamps[i + 1] || Float::INFINITY
      {
        slide: i,
        timestamp: ts,
        text: segments.select { |s| s[:start] >= ts && s[:start] < next_ts }
                      .map { |s| s[:text] }.join(' ')
      }
    end
  end
end
```

### 2. Diagram Extractor (0)

Extracts frames at key timestamps and OCRs equations/diagrams:

```ruby
require 'mathpix'

class DiagramExtractor
  MATHPIX_APP_ID = ENV['MATHPIX_APP_ID']
  MATHPIX_APP_KEY = ENV['MATHPIX_APP_KEY']
  
  def initialize(video_path)
    @video = video_path
  end
  
  def extract_frame(timestamp, output_path)
    # Use ffmpeg to extract frame
    system("ffmpeg -y -ss #{timestamp} -i '#{@video}' -vframes 1 -q:v 2 '#{output_path}'")
    output_path
  end
  
  def ocr_frame(image_path)
    # Send to Mathpix for LaTeX extraction
    response = Mathpix.process(
      src: "data:image/png;base64,#{Base64.encode64(File.read(image_path))}",
      formats: ['latex_styled', 'text'],
      data_options: { include_asciimath: true }
    )
    
    {
      latex: response['latex_styled'],
      text: response['text'],
      confidence: response['confidence'],
      has_diagram: response['is_printed'] || response['is_handwritten']
    }
  end
  
  def extract_all(timestamps)
    timestamps.map.with_index do |ts, i|
      frame_path = "/tmp/frame_#{i}_#{ts.to_i}.png"
      extract_frame(ts, frame_path)
      result = ocr_frame(frame_path)
      result.merge(timestamp: ts, slide_num: i)
    end
  end
end
```

### 3. Skill Mapper (+1)

Maps extracted content to skills with GF(3) conservation:

```ruby
class SkillMapper
  SKILL_KEYWORDS = {
    'acsets' => %w[acset c-set schema functor category],
    'sheaf-cohomology' => %w[sheaf cohomology local global section],
    'structured-decomp' => %w[tree decomposition treewidth bag],
    'kan-extensions' => %w[kan extension adjoint limit colimit],
    'polynomial' => %w[polynomial poly interface arena],
    'temporal-coalgebra' => %w[temporal time varying dynamic coalgebra],
    'operad-compose' => %w[operad wiring diagram composition],
  }
  
  SKILL_TRITS = {
    'acsets' => 0, 'sheaf-cohomology' => -1, 'structured-decomp' => -1,
    'kan-extensions' => 0, 'polynomial' => 0, 'temporal-coalgebra' => -1,
    'operad-compose' => +1, 'oapply-colimit' => +1, 'gay-mcp' => +1,
  }
  
  def map_content(text, latex)
    combined = "#{text} #{latex}".downcase
    
    skills = SKILL_KEYWORDS.select do |skill, keywords|
      keywords.any? { |kw| combined.include?(kw) }
    end.keys
    
    # Ensure GF(3) balance
    balance_skills(skills)
  end
  
  def balance_skills(skills)
    trit_sum = skills.sum { |s| SKILL_TRITS[s] || 0 }
    
    # Add balancing skills if needed
    case trit_sum % 3
    when 1  # Need -1
      skills << 'sheaf-cohomology' unless skills.include?('sheaf-cohomology')
    when 2  # Need +1  (equivalent to -1 mod 3)
      skills << 'operad-compose' unless skills.include?('operad-compose')
    end
    
    skills
  end
end
```

## Complete Pipeline

```ruby
class Sense
  def initialize(video_path, vtt_path, output_db: 'tensor_skill_paper.duckdb')
    @video = video_path
    @vtt = vtt_path
    @db_path = output_db
    @content_id = File.basename(video_path, '.*')
    
    @subtitle_parser = SubtitleParser.new(vtt_path)
    @diagram_extractor = DiagramExtractor.new(video_path)
    @skill_mapper = SkillMapper.new
  end
  
  def process!
    # 1. Parse subtitles
    segments = @subtitle_parser.segments
    
    # 2. Detect slide transitions (silence gaps or visual changes)
    slide_timestamps = detect_slides(segments)
    
    # 3. Extract and OCR key frames
    diagrams = @diagram_extractor.extract_all(slide_timestamps)
    
    # 4. Map skills with GF(3) balance
    indexed = diagrams.map do |d|
      subtitle_text = @subtitle_parser.by_slide(slide_timestamps)[d[:slide_num]][:text]
      skills = @skill_mapper.map_content(subtitle_text, d[:latex] || '')
      
      d.merge(
        subtitle_text: subtitle_text,
        skills: skills,
        trit: skills.sum { |s| SkillMapper::SKILL_TRITS[s] || 0 } % 3
      )
    end
    
    # 5. Store in DuckDB
    store_index(indexed)
    
    # 6. Create views
    create_views
    
    indexed
  end
  
  private
  
  def detect_slides(segments)
    # Simple: gap > 2s indicates slide change
    timestamps = [0.0]
    segments.each_cons(2) do |a, b|
      if b[:start] - a[:end] > 2.0
        timestamps << b[:start]
      end
    end
    timestamps
  end
  
  def store_index(indexed)
    conn = DuckDB::Database.open(@db_path).connect
    
    conn.execute("DROP TABLE IF EXISTS #{@content_id}_sense_index")
    conn.execute(<<~SQL)
      CREATE TABLE #{@content_id}_sense_index (
        slide_num INTEGER,
        timestamp FLOAT,
        latex VARCHAR,
        has_diagram BOOLEAN,
        subtitle_text TEXT,
        skills TEXT,
        trit INTEGER,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    SQL
    
    indexed.each do |row|
      conn.execute(<<~SQL, [
        row[:slide_num], row[:timestamp], row[:latex],
        row[:has_diagram], row[:subtitle_text],
        row[:skills].to_json, row[:trit]
      ])
        INSERT INTO #{@content_id}_sense_index VALUES (?, ?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
      SQL
    end
    
    conn.close
  end
  
  def create_views
    conn = DuckDB::Database.open(@db_path).connect
    
    conn.execute(<<~SQL)
      CREATE OR REPLACE VIEW v_#{@content_id}_timeline AS
      SELECT 
        slide_num,
        printf('%02d:%05.2f', CAST(timestamp/60 AS INT), timestamp % 60) as timecode,
        CASE WHEN has_diagram THEN '📊' ELSE '' END ||
        CASE WHEN latex != '' AND latex IS NOT NULL THEN '📐' ELSE '' END as content,
        trit,
        skills
      FROM #{@content_id}_sense_index
      ORDER BY timestamp
    SQL
    
    conn.close
  end
end
```

## Usage

### Ruby

```ruby
require_relative 'lib/sense'

# Process a video lecture
sense = Sense.new(
  'reference/videos/bumpus_ct2021.mkv',
  'reference/videos/bumpus_ct2021.en.vtt'
)
indexed = sense.process!

puts "Indexed #{indexed.size} slides"
```

### Command Line

```bash
# Extract subtitles from video (if not available)
uvx yt-dlp --write-auto-sub --sub-lang en --skip-download \
  -o 'reference/videos/%(id)s' 'https://youtube.com/watch?v=VIDEO_ID'

# Run sense extraction
just sense-extract reference/videos/bumpus_ct2021.mkv

# Query the index
just sense-timeline bumpus_ct2021
just sense-skills bumpus_ct2021 acsets
```

### Python Alternative

```python
#!/usr/bin/env python3
"""sense.py - Python implementation of diagrammatic video extraction"""

import duckdb
import webvtt
import subprocess
import json
from pathlib import Path

class Sense:
    def __init__(self, video_path: str, vtt_path: str, db_path: str = "tensor_skill_paper.duckdb"):
        self.video = Path(video_path)
        self.vtt = Path(vtt_path)
        self.db_path = db_path
        self.content_id = self.video.stem
    
    def parse_subtitles(self):
        """Parse VTT file into segments"""
        captions = webvtt.read(str(self.vtt))
        return [
            {
                'start': self._time_to_seconds(c.start),
                'end': self._time_to_seconds(c.end),
                'text': c.text.strip()
            }
            for c in captions
        ]
    
    def extract_frame(self, timestamp: float, output_path: str):
        """Extract single frame at timestamp"""
        subprocess.run([
            'ffmpeg', '-y', '-ss', str(timestamp),
            '-i', str(self.video), '-vframes', '1',
            '-q:v', '2', output_path
        ], capture_output=True)
        return output_path
    
    def ocr_frame_mathpix(self, image_path: str):
        """OCR frame using mathpix-gem"""
        # Shell out to Ruby mathpix-gem
        result = subprocess.run([
            'ruby', '-rmathpix', '-e',
            f"puts Mathpix.process_image('{image_path}').to_json"
        ], capture_output=True, text=True)
        
        if result.returncode == 0:
            return json.loads(result.stdout)
        return {'latex': '', 'text': '', 'has_diagram': False}
    
    def _time_to_seconds(self, time_str: str) -> float:
        """Convert HH:MM:SS.mmm to seconds"""
        parts = time_str.split(':')
        return int(parts[0]) * 3600 + int(parts[1]) * 60 + float(parts[2])
```

## Justfile Commands

```just
# Extract and index a video lecture
sense-extract video:
    @echo "👁️ SENSE: Extracting {{video}}"
    ruby -I lib -r sense -e "Sense.new('{{video}}', '{{video}}'.sub('.mkv', '.en.vtt')).process!"

# Download subtitles for a YouTube video
sense-subtitles url output:
    uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o '{{output}}' '{{url}}'

# Show timeline for indexed content
sense-timeline content_id:
    @source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
        "SELECT * FROM v_{{content_id}}_timeline"

# Find slides mentioning a skill
sense-skills content_id skill:
    @source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
        "SELECT slide_num, timecode, skills FROM v_{{content_id}}_timeline WHERE skills LIKE '%{{skill}}%'"

# Extract frame at timestamp
sense-frame video timestamp:
    flox activate -- ffmpeg -y -ss {{timestamp}} -i '{{video}}' -vframes 1 -q:v 2 /tmp/sense_frame.png
    @echo "✓ Frame extracted to /tmp/sense_frame.png"

# OCR a frame with Mathpix
sense-ocr image:
    ruby -rmathpix -e "puts Mathpix.process_image('{{image}}').to_json" | jq .

# Full pipeline: download, extract, index
sense-full url content_id:
    @echo "📥 Downloading video and subtitles..."
    uvx yt-dlp -o 'reference/videos/{{content_id}}.mkv' '{{url}}'
    uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o 'reference/videos/{{content_id}}' '{{url}}'
    @echo "👁️ Running sense extraction..."
    just sense-extract 'reference/videos/{{content_id}}.mkv'
```

## GF(3) Conservation

The skill ensures every indexed slide has a balanced trit sum:

```sql
-- Verify GF(3) balance
SELECT 
    content_id,
    SUM(trit) as total_trit,
    SUM(trit) % 3 as gf3,
    CASE WHEN SUM(trit) % 3 = 0 THEN '✓' ELSE '✗' END as balanced
FROM sense_index
GROUP BY content_id;
```

## Integration with Galois Infrastructure

After `sense` extracts content, register it in the Galois connection:

```sql
-- Update content_registry
UPDATE content_registry 
SET indexed = TRUE, 
    index_table = 'bumpus_ct2021_sense_index'
WHERE content_id = 'bumpus_ct2021';

-- Content now flows through Galois lattice
SELECT * FROM v_galois_content_to_skills WHERE content_id = 'bumpus_ct2021';
```

## Dependencies

```yaml
# Ruby gems
gems:
  - webvtt-ruby      # VTT parsing
  - mathpix          # Mathpix OCR API
  - duckdb           # Database storage

# System tools
tools:
  - ffmpeg           # Frame extraction
  - yt-dlp           # Video/subtitle download

# Environment variables
env:
  MATHPIX_APP_ID: "your-app-id"
  MATHPIX_APP_KEY: "your-app-key"
```

## Triads Using sense

```
# sense as coordinator in extraction triads:
subtitle-parser (-1) ⊗ sense (0) ⊗ skill-mapper (+1) = 0 ✓

# Combined with other skills:
sheaf-cohomology (-1) ⊗ sense (0) ⊗ gay-mcp (+1) = 0 ✓  [Colored diagrams]
temporal-coalgebra (-1) ⊗ sense (0) ⊗ koopman-generator (+1) = 0 ✓  [Dynamics]
persistent-homology (-1) ⊗ sense (0) ⊗ topos-generate (+1) = 0 ✓  [Topology]
```

## See Also

- `mathpix-ocr` - LaTeX extraction backend
- `galois-infrastructure` - Content ⇆ Skills ⇆ Worlds
- `parallel-fanout` - Triadic parallel dispatch
- `duckdb-temporal-versioning` - Time-travel queries
- Cat# treatment examples: `complete_catsharp_index.py`, `complete_bumpus_index.py`

---

## Tsao Visual Hierarchy Integration

Sense is maximally informed by **Doris Tsao's visual neuroscience**. See [DORIS_TSAO_VISUAL_NEUROSCIENCE_BRIDGE.md](file:///Users/bob/ies/music-topos/DORIS_TSAO_VISUAL_NEUROSCIENCE_BRIDGE.md).

### Tsao Hierarchy → Sense Components

| Tsao Level | Visual Region | Sense Component | Function |
|------------|---------------|-----------------|----------|
| **Level 0** | V1 simple cells | Subtitle Parser (-1) | Edge detection, timestamp boundaries |
| **Level 1** | V2/V4 complex | Diagram Extractor (0) | Feature integration, OCR |
| **Level 2** | IT face patches | Skill Mapper (+1) | Pattern recognition, skill assignment |
| **Level 3** | Prefrontal | GF(3) Balancer | Behavioral goal, conservation |

### Self-Avoiding Walks via Self-Coloring

From chromatic-walk insight: SAWs don't intersect **by definition**, but in an effective topos we verify through **self-coloring**:

```python
def saw_verified_by_self_coloring(walk: list) -> bool:
    """
    In effective topos, self-intersection is decidable.
    
    The reafference equation:
      Generate(seed, i) = Observe(seed, i) ⟺ self ≡ self
    
    If walk revisits (seed, index), it generates the SAME color
    at two walk positions — contradiction detected.
    """
    colors = [Gay.color_at(step.seed, step.index) for step in walk]
    return len(colors) == len(set(colors))  # No repeated colors ⟺ SAW
```

### Connection to Frontier Lab Circuits

Sense extraction parallels mechanistic interpretability:

| Sense | Circuits Research | Tsao |
|-------|-------------------|------|
| Subtitle segments | Attention heads | V1 edges |
| Diagram features | Activation patterns | V2 shapes |
| Skill mapping | Circuit identification | IT patches |
| GF(3) balance | Superposition control | Prefrontal |

See: [FRONTIER_LAB_CIRCUITS_INTERACTOME.md](file:///Users/bob/ies/music-topos/FRONTIER_LAB_CIRCUITS_INTERACTOME.md)

### Chang-Tsao 50D Face Space → Skill Space

```
Face Space (Tsao):
  25 shape axes + 25 appearance axes = 50D
  Each neuron encodes ONE axis
  Population decodes via linear combination

Skill Space (Sense):
  N skills with trit assignments (-1, 0, +1)
  Each slide maps to skill subset
  GF(3) conservation ensures balance
```

---

## Phenomenal Topology

Sense extraction states map to QRI's Symmetry Theory of Valence:

| State | Visual Cortex | Sense Extraction | GF(3) |
|-------|---------------|------------------|-------|
| **Smooth** | All levels coherent | Clean skill mapping | = 0 |
| **Defect** | Prediction error | Ambiguous slide | ≠ 0 |
| **Vortex** | High entropy | Multiple skill conflicts | ≫ 0 |

### Rebalancing

```python
def rebalance_defect(slide_skills: list, target_gf3: int = 0) -> list:
    """Restore GF(3) = 0 by adding compensating skills."""
    current_sum = sum(SKILL_TRITS[s] for s in slide_skills)
    deficit = (target_gf3 - current_sum) % 3
    
    if deficit == 1:
        slide_skills.append('sheaf-cohomology')  # -1
    elif deficit == 2:
        slide_skills.append('operad-compose')    # +1
    
    return slide_skills
```

---

**Skill Name**: sense  
**Trit**: 0 (ERGODIC - Coordinator)  
**Tsao Integration**: V1→V2→IT→Prefrontal hierarchy  
**SAW Verification**: Effective topos self-coloring

Overview

This skill extracts structured, queryable knowledge from lecture videos by combining subtitle parsing, diagram/equation OCR, and balanced skill mapping. It produces a timestamped index of transcript segments, extracted LaTeX from frames, and skill tags with GF(3) trit conservation. The result is a DuckDB-backed timeline and skill views ready for search and downstream analysis.

How this skill works

The pipeline parses WebVTT/SRT subtitles into timed segments and groups them by detected slide boundaries. It extracts key frames with ffmpeg and sends images to Mathpix for OCR to obtain LaTeX and plain text representations of diagrams and equations. Finally, content is matched to a controlled skill vocabulary and balanced with GF(3) trits, then stored as a queryable DuckDB index and timeline view.

When to use it

You need searchable lecture content with equations and diagram text aligned to timestamps.
Preparing educational resources for indexing, retrieval, or automated tagging.
Building downstream analytics that require slide-level skills or time-aligned LaTeX.
Automating ingestion of recorded seminars where slides and spoken transcript must be linked.

Best practices

Provide accurate WebVTT/SRT subtitles; auto-generated captions work but reduce OCR-context quality.
Tune slide detection (gap threshold or visual change) to match recording style.
Use Mathpix credentials and monitor OCR confidence to flag low-quality frames.
Maintain and extend the SKILL_KEYWORDS and SKILL_TRITS list to reflect your domain vocabulary.
Back up the DuckDB index and expose views for read-only analytics queries.

Example use cases

Indexing a university lecture series so students can jump to slides containing specific formulas.
Creating a searchable archive of conference talks with diagram-extracted LaTeX for reuse.
Generating skill maps for curriculum analysis and identifying topic coverage across videos.
Feeding time-aligned content into tutoring systems or automated summarizers.
Auditing recorded workshops to ensure topic balance via GF(3)-conserved skill tags.

FAQ

What formats are supported for video and subtitles?

Common video containers like .mkv and .mp4 and subtitle formats WebVTT (.vtt) and SRT (.srt) are supported.

How does GF(3) balancing affect skill assignment?

GF(3) balancing ensures the trit sum for each indexed segment is conserved by adding predefined balancing skills if needed, giving a deterministic modular balance for downstream aggregation.