home / skills / jwynia / agent-skills / ebook-analysis

ebook-analysis skill

safe

/skills/general/document-processing/analysis/ebook-analysis

This skill analyzes ebooks to extract concepts and entities with full citation traceability, enabling cross-book synthesis and a reusable knowledge base.

npx playbooks add skill jwynia/agent-skills --skill ebook-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (22)

SKILL.md

19.0 KB

---
name: ebook-analysis
description: Parse ebooks, extract concepts and entities with citation traceability, classify by type/layer, and synthesize across book collections.
license: MIT
metadata:
  author: jwynia
  version: "2.1"
  domain: research
  cluster: media-meta-analysis
  type: orchestrator
  mode: diagnostic+generative
  maturity: working
  maturity_score: 14
---

# Ebook Analysis: Non-Fiction Knowledge Extraction

You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:

1. **Concept Extraction** - Extract ideas classified by abstraction (principle → tactic)
2. **Entity Extraction** - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books

## Core Principle

**Every extraction must be traceable to its exact source.** Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.

---

## Two Extraction Modes

### Mode 1: Concept Extraction
For extracting IDEAS organized by abstraction level.

**Use when:** Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.

**Output:** JSON files (analysis.json, concepts.json)

**Example:** "Spaced repetition improves retention" is a MECHANISM at Layer 2.

### Mode 2: Entity Extraction
For extracting NAMED THINGS that can be cross-referenced across books.

**Use when:** Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.

**Output:** Markdown files in knowledge base structure

**Example:** "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.

### Choosing a Mode

| If you want to... | Use Mode |
|-------------------|----------|
| Understand a book's argument structure | Concept Extraction |
| Build a reference library across books | Entity Extraction |
| Create actionable takeaways | Concept Extraction |
| Track what researchers say across sources | Entity Extraction |
| Both | Run both modes sequentially |

---

## Entity Extraction Mode (Detailed)

### Entity Types

| Type | What It Captures | Example |
|------|------------------|---------|
| **study** | Research findings, experiments, data | Flynn Effect, Marshmallow Test |
| **researcher** | People and their contributions | Anders Ericsson, Robin Hogarth |
| **framework** | Mental models, taxonomies, systems | Kind vs Wicked, Desirable Difficulties |
| **anecdote** | Stories used to illustrate points | Tiger vs Roger, Challenger Disaster |
| **concept** | Ideas that aren't frameworks | Cognitive entrenchment, Match quality |

### Extended Entity Type Guidance

Some entities don't fit cleanly into the five types. Guidelines:

| Entity Kind | Use Type | Rationale |
|-------------|----------|-----------|
| **Simulations/Games** (Superstruct, EVOKE) | anecdote | Illustrative events, even if hypothetical |
| **Institutions** (IFTF, WEF) | researcher | Organizations contribute ideas like individuals |
| **Historical events** (Challenger disaster) | anecdote | Stories that illustrate principles |
| **Hypothetical scenarios** | anecdote | Future scenarios from books like Imaginable |
| **Thought experiments** | framework | If systematic; otherwise concept |

**When uncertain:** Default to `anecdote` for narratives/events, `concept` for ideas, `framework` for systematic methods.

### Author-as-Subject Pattern

When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):

**Create a researcher entity if:**
- Author has notable prior work or institutional affiliation
- Author appears in Wikipedia or other reference sources
- Author's background/credentials are relevant to understanding the book
- Other books in your collection might reference them

**Skip if:**
- Author is primarily known only for this book
- No external sources to verify/enrich the entity

**Template addition for author-subjects:**
```markdown
## Note
This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
```

### Entity File Template

```markdown
# [Entity Name]
**Type:** study | researcher | framework | anecdote | concept
**Status:** stub | partial | solid | authoritative
**Last Updated:** YYYY-MM-DD
**Aliases:** alias1, alias2, alias3

## Summary
[2-3 sentence synthesized understanding]

## Key Findings / What It Illustrates
1. [Claim or finding with source]
   — Source: [Book], Ch.[X]

2. [Another claim]
   — Source: [Book], Ch.[X]

## Key Quotes
> "Quotable text here."

> "Another memorable quote."

## Sources in Collection
| Book | Author | How It's Used | Citation |
|------|--------|---------------|----------|
| Range | Epstein | [Role in book] | Ch.X |

## Sources NOT in Collection
- [Book that would enrich this entity]

## Related Entities
- [Other Entity](../type/other-entity.md) - Relationship description

## Open Questions
- [What we don't yet know]
```

### Knowledge Base Structure

```
/knowledge/
├── _index.md                    # Master registry
├── _entities.json               # Searchable index (generated)
│
├── nonfiction/
│   ├── _index.md                # Domain index
│   ├── _[book]-quotes.md        # Book-specific quotes file
│   ├── studies/
│   │   ├── flynn-effect.md
│   │   └── chase-simon-chunking.md
│   ├── researchers/
│   │   ├── hogarth-robin.md
│   │   └── tetlock-philip.md
│   ├── frameworks/
│   │   ├── kind-vs-wicked-environments.md
│   │   └── desirable-difficulties.md
│   ├── anecdotes/
│   │   ├── tiger-vs-roger.md
│   │   └── challenger-disaster.md
│   └── concepts/
│       ├── cognitive-entrenchment.md
│       └── match-quality.md
│
├── cooking/                     # Domain-specific structure
│   ├── techniques/
│   ├── ingredients/
│   └── equipment/
│
└── technical/
    ├── patterns/
    └── technologies/
```

### Quotes Extraction

Quotable quotes are a distinct extraction type. For each book, create a quotes file:

**File:** `_[book-slug]-quotes.md`

**Structure:**
```markdown
# Quotable Quotes from [Book Title]
**Author:** [Author]
**Last Updated:** YYYY-MM-DD

## On [Theme 1]
> "Quote text here."

> "Another quote on same theme."

## On [Theme 2]
> "Quote on different theme."
```

**What makes a good quote:**
- Memorable phrasing that captures a key insight
- Self-contained (understandable without context)
- Surprising or counterintuitive formulation
- Useful for presentations, writing, or reference

### Entity Extraction Workflow

1. **Scan book** - Read through identifying named studies, researchers, frameworks, illustrative stories
2. **Check existing entities** - Use `kb-resolve-entity.ts` to see if entity already exists
3. **Create or update** - New entity → create file; existing → add as source
4. **Add quotes** - Extract memorable quotes to quotes file
5. **Cross-link** - Add Related Entities sections
6. **Regenerate index** - Run `kb-generate-index.ts`

### Entity Extraction States (KB0-KB5)

| State | Symptoms | Intervention |
|-------|----------|--------------|
| **KB0** | No knowledge base | Create directory structure |
| **KB1** | Structure exists, no entities | Begin extraction |
| **KB2** | Extracting from book | Create entity files |
| **KB3** | Entities created, not linked | Add Related Entities |
| **KB4** | Linked, no index | Run kb-generate-index.ts |
| **KB5** | Complete for this book | Proceed to next book |

### Cross-Book Synthesis Workflow

**Triggered when:** 2+ books have been extracted to the knowledge base.

**Goals:**
1. Find entities that appear in multiple books
2. Identify conceptual connections between books
3. Surface contradictions or complementary perspectives
4. Update entity files with multi-source synthesis

**Process:**

1. **Entity overlap detection**
   ```bash
   # Find entities with 2+ sources
   grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \
     xargs grep -l "| .* | .* |" | head -20
   ```
   Or manually review entities updated with new source.

2. **Conceptual connection mapping**
   - Compare frameworks across books (e.g., Range's "wicked environments" ↔ Imaginable's "futures thinking")
   - Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable)
   - Look for complementary themes (prediction failure → preparation despite uncertainty)

3. **Synthesis documentation**
   For entities appearing in 2+ books, update the Summary section:
   ```markdown
   ## Summary
   [Synthesized understanding from BOTH sources, noting agreements and differences]
   ```

4. **Cross-book insights**
   Document thematic connections in `context/insights/cross-book-{theme}.md`:
   ```markdown
   # Cross-Book Insight: [Theme]

   ## Books Contributing
   - Range (Epstein) - [perspective]
   - Imaginable (McGonigal) - [perspective]

   ## Synthesis
   [How the books complement or contradict each other]
   ```

---

## Concept Extraction Mode (Detailed)

### Concept Types (Abstract → Concrete)

| Type | Definition | Example |
|------|------------|---------|
| **Principle** | Foundational truth or axiom | "Communities form around shared identity" |
| **Mechanism** | How something works | "Reciprocity creates social bonds" |
| **Pattern** | Recurring structure or approach | "The community lifecycle pattern" |
| **Strategy** | High-level approach to achieve goals | "Build trust before asking for contribution" |
| **Tactic** | Specific actionable technique | "Send welcome emails within 24 hours" |

### Abstraction Layers

| Layer | Name | Abstraction | Example |
|-------|------|-------------|---------|
| 0 | Foundational | Universal principles | "Humans seek belonging" |
| 1 | Theoretical | Domain-specific theory | "Community requires shared purpose" |
| 2 | Strategic | Approaches and frameworks | "The funnel model of engagement" |
| 3 | Tactical | Specific methods | "Onboarding sequences" |
| 4 | Specific | Concrete implementations | "Use Discourse for forums" |

### Relationship Types

| Relationship | Meaning | When to Use |
|--------------|---------|-------------|
| **INFLUENCES** | A affects B | Causal or correlational connection |
| **SUPPORTS** | A provides evidence for B | Citation, example, validation |
| **CONTRADICTS** | A conflicts with B | Opposing claims |
| **COMPOSED_OF** | A contains B | Part-whole relationships |
| **DERIVES_FROM** | A is derived from B | Logical conclusions |

### Concept Extraction States (EA0-EA7)

| State | Symptoms | Intervention |
|-------|----------|--------------|
| **EA0** | No input file | Guide file preparation |
| **EA1** | Raw file, not parsed | Run ea-parse.ts |
| **EA2** | Parsed, not extracted | LLM extracts concepts |
| **EA3** | Extracted, not classified | Assign types and layers |
| **EA4** | Classified, not annotated | Add themes, relationships |
| **EA5** | Single book complete | Export or proceed to synthesis |
| **EA6** | Multi-book ready | Cross-book synthesis |
| **EA7** | Analysis complete | Generate reports |

### Concept Extraction Workflow

1. **Parse** - Run `ea-parse.ts` to chunk book with position tracking
2. **Extract** - Present chunks to LLM for concept identification with exact quotes
3. **Classify** - Assign type (principle→tactic) and layer (0-4)
4. **Annotate** - Add themes and functional analysis
5. **Link** - Connect related concepts
6. **Export** - Generate analysis.json, concepts.json, report.md

---

## Available Tools

### Parsing Tools

#### ea-parse.ts
Parse ebook files into chunks with metadata and position tracking.

```bash
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
```

**Output:** JSON with metadata, chapters (if detected), and chunks with positions.

### Knowledge Base Tools

#### kb-generate-index.ts
Scan knowledge base and generate searchable entity index.

```bash
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
```

**Output:** Creates `_entities.json` with all entities, aliases, and metadata.

#### kb-resolve-entity.ts
Search for existing entities before creating duplicates.

```bash
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json
```

**Options:**
- `--threshold <0-1>` - Minimum match score (default: 0.3)
- `--limit <n>` - Maximum results (default: 5)
- `--json` - Output as JSON

### Validation Tools

#### ea-validate.ts
Validate analysis output for citation accuracy and schema completeness.

```bash
deno run --allow-read scripts/ea-validate.ts analysis.json --report
```

---

## Anti-Patterns

### The Extraction Flood
**Pattern:** Extracting every potentially interesting phrase.
**Fix:** Ask "Would I cite this?" before extracting. Quality over quantity.

### The Citation Black Hole
**Pattern:** Extracting without preserving exact quotes or positions.
**Fix:** Always capture: exact quote, chapter reference, context.

### The Duplicate Entity
**Pattern:** Creating new entity without checking if it exists.
**Fix:** Always run `kb-resolve-entity.ts` first.

### The Orphan Entity
**Pattern:** Entities without Related Entities links.
**Fix:** Every entity should connect to at least 2 others.

### The Quote-Free Entity
**Pattern:** Entity captures ideas but no memorable phrasing.
**Fix:** Include Key Quotes section with author's exact words.

### The Single-Book Silo
**Pattern:** Analyzing books without cross-referencing.
**Fix:** After 2+ books, run synthesis to find connections.

---

## Example Workflows

### Full Entity Extraction (Range Example)

```
1. Scan book chapter by chapter
2. Identify all named studies, researchers, frameworks, anecdotes
3. Create inventory document listing all potential entities
4. For each entity:
   a. kb-resolve-entity.ts "[entity name]" to check existence
   b. Create markdown file in appropriate type directory
   c. Fill in template with findings and citations
   d. Add Key Quotes section
5. Create _range-quotes.md with all memorable quotes
6. Update _index.md with new entities
7. kb-generate-index.ts to rebuild _entities.json
```

### Quick Concept Scan

```
1. ea-parse.ts book.txt --chunk-size 2000
2. For each chunk, extract top 3-5 concepts
3. Classify by type and layer
4. Generate concepts.json and report.md
```

---

## Output Persistence

### Entity Extraction Output

| File | Location |
|------|----------|
| Entity files | `knowledge/{domain}/{type}/{entity-slug}.md` |
| Quotes file | `knowledge/{domain}/_[book]-quotes.md` |
| Entity index | `knowledge/_entities.json` |
| Domain index | `knowledge/{domain}/_index.md` |

### Concept Extraction Output

| File | Location |
|------|----------|
| Full analysis | `ebook-analysis/{author}-{title}/analysis.json` |
| Concepts only | `ebook-analysis/{author}-{title}/concepts.json` |
| Citations | `ebook-analysis/{author}-{title}/citations.json` |
| Report | `ebook-analysis/{author}-{title}/report.md` |

---

## Verification (Oracle)

### What This Skill Can Verify
- **Citation positions exist** - Validate quoted text appears at claimed position
- **Schema completeness** - Required fields present
- **Cross-reference integrity** - Referenced entities exist
- **Duplicate detection** - Entity doesn't already exist (via kb-resolve-entity.ts)

### What Requires Human Judgment
- **Significance** - Is this worth extracting?
- **Classification** - Is this really a "framework" vs "concept"?
- **Relationship validity** - Does A really influence B?
- **Quote quality** - Is this actually memorable?

---

## Integration Graph

### Inbound (From Other Skills)
| Source | Leads to |
|--------|----------|
| research | Multi-book synthesis ready |
| reverse-outliner | Structural data for concept extraction |

### Outbound (To Other Skills)
| From State | Leads to |
|------------|----------|
| Entity extraction complete | dna-extraction (deep functional analysis) |
| Concept extraction complete | media-meta-analysis (cross-source synthesis) |

### Complementary Skills
| Skill | Relationship |
|-------|--------------|
| dna-extraction | 6-axis functional analysis for annotation |
| reverse-outliner | Structural approach for fiction |
| voice-analysis | Author style fingerprinting |
| context-network | Knowledge base maintenance |

---

## Calibration Data (from Range + Imaginable extractions)

### By Book Density
| Book Type | Expected Entities | Estimated Effort |
|-----------|-------------------|------------------|
| Dense non-fiction (Range, Thinking Fast & Slow) | 60-100 | 4-6 hours |
| Moderate non-fiction (most business books) | 30-50 | 2-3 hours |
| Light non-fiction (popular science) | 15-30 | 1-2 hours |
| Technical books | 20-40 | 2-3 hours |

### By Book Subtype
Different non-fiction subtypes yield different entity profiles:

| Subtype | Example | Entity Profile | Expected Count |
|---------|---------|----------------|----------------|
| **Research synthesis** | Range | Many studies, researchers, frameworks | 60-100 |
| **Methodological/How-to** | Imaginable | Many frameworks, few studies | 30-50 |
| **Memoir/Narrative** | Educated | Few frameworks, many anecdotes | 20-40 |
| **Reference** | Technical manuals | Many concepts, few anecdotes | Variable |

**Research synthesis books** cite many studies and researchers, connecting ideas across domains.
**Methodological books** teach techniques and frameworks but cite fewer external sources.
**Memoir/narrative** books use personal stories to illustrate points rather than research.

### Metadata Reliability Warning
Book classification metadata (Calibre tags, library categories) is often:
- **Wrong** - Fiction/non-fiction misclassified
- **Generic** - "General Fiction" or "Self-Help" applied broadly
- **Inconsistent** - Same book categorized differently across sources

Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error.

---

## Reasoning Requirements

### Standard Reasoning
- Single chunk concept extraction
- Type/layer classification
- Simple relationship identification
- Individual entity creation

### Extended Reasoning (ultrathink)
Use extended thinking for:
- **Multi-book synthesis** - requires holding multiple networks simultaneously
- **Contradiction detection** - semantic comparison across sources
- **Theme emergence** - identifying patterns across large sets
- **Knowledge gap identification** - reasoning about what's missing

**Trigger phrases:** "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis"

---

## What You Do NOT Do

- Extract without citation traceability
- Create entities without checking for duplicates
- Skip the linking phase (orphan entities are not useful)
- Leave entities without quotes
- Treat fiction as non-fiction
- Use regex for semantic analysis (LLM judgment only)

Overview

This skill analyzes non-fiction ebooks to extract ideas and named entities with full citation traceability. It supports two complementary modes: concept extraction (principles→tactics) and entity extraction (studies, researchers, frameworks, anecdotes). Outputs are structured JSON and markdown files that enable cross-book synthesis and a searchable knowledge base.

How this skill works

The parser chunks books with precise position metadata, then an LLM-driven pipeline extracts concepts or named entities tied to exact quotes and chapter references. In Concept mode, each idea is classified by type (principle, mechanism, pattern, strategy, tactic) and abstraction layer (0–4) and exported as analysis.json and concepts.json. In Entity mode, named items become markdown entity files with summaries, sourced claims, key quotes, aliases, and cross-links; an index generator produces a searchable _entities.json.

When to use it

When you want a citation-traceable map of a single book’s arguments or actionable takeaways
When building a cross-book knowledge base of recurring studies, frameworks, or researchers
When you need to resolve aliases and consolidate entities that appear across multiple books
When you want structured outputs for downstream reporting, notes, or automated indexing

Best practices

Prioritize extraction quality over quantity: extract items you would actually cite
Always capture exact quote text plus chapter/location metadata for provenance
Run kb-resolve-entity before creating an entity to avoid duplicates
Label concepts with both type and abstraction layer to support synthesis across books
After processing 2+ books, run cross-book synthesis to surface agreements, contradictions, and shared entities

Example use cases

Generate a concept taxonomy from a book to turn insights into product or policy recommendations
Build a knowledge base of researchers and studies cited across a reading list for literature reviews
Extract memorable, self-contained quotes per book for presentations or teaching materials
Detect overlapping frameworks and update entity files with synthesized summaries from multiple sources
Validate extraction completeness and citation accuracy with the provided validation script

FAQ

Which mode should I run first if I want both ideas and entities?

Run Concept Extraction to capture the book’s argument structure, then run Entity Extraction to create or update named-thing files and quotes; you can run both sequentially for full coverage.

How do I avoid creating duplicate entities?

Always run the kb-resolve-entity utility with an appropriate threshold before creating a new entity; use aliases and update existing files when matches are found.