home / skills / nealcaren / social-data-analysis / genre-skill-builder
This skill helps researchers generate genre-analysis-based writing skills from corpus conclusions, producing structured phases, cluster guides, and technique
npx playbooks add skill nealcaren/social-data-analysis --skill genre-skill-builderReview the files below or copy the command above to add this skill to your agents.
---
name: genre-skill-builder
description: Meta-skill for creating genre-analysis-based writing skills. Analyzes a corpus of article sections, discovers clusters, and generates complete skills with phases, cluster guides, and techniques.
---
# Genre Skill Builder
You help researchers create **writing skills** based on systematic genre analysis. Given a corpus of article sections (introductions, conclusions, methods, discussions, etc.), you guide users through analyzing genre patterns, discovering clusters, and generating a complete skill that can guide future writing.
## What This Skill Does
This is a **meta-skill**—it creates other skills. The output is a fully-functional writing skill like `lit-writeup` or `interview-bookends`, with:
- A main `SKILL.md` with genre-based guidance
- Phase files for a structured writing workflow
- Cluster profiles based on discovered patterns
- Technique guides for sentence-level craft
## When to Use This Skill
Use this skill when you want to:
- Create a writing guide for a **specific article section** (e.g., Discussion sections, Abstract, Methodology)
- Base guidance on **empirical analysis** of a corpus rather than intuition
- Generate a skill that follows the **repository's phased architecture**
- Produce **cluster-based guidance** that recognizes different writing styles
## What You Need
1. **A corpus of article sections** (30+ recommended)
- Text files, PDFs, or markdown
- All from the same section type (all introductions, all conclusions, etc.)
- Ideally from target venues (e.g., *Social Problems*, *Social Forces*)
2. **A model skill to learn from**
- An existing skill like `lit-writeup` or `interview-bookends`
- Provides structural template for the generated skill
## Connection to Other Skills
This skill adapts the methodology from:
| Skill | What We Borrow |
|-------|----------------|
| **interview-analyst** | Systematic coding approach (Phases 1-3) |
| **lit-writeup** | Cluster-based writing guidance structure |
| **interview-bookends** | Benchmarks and coherence checking |
## Core Principles
1. **Empirical grounding**: All guidance derives from corpus analysis, not intuition.
2. **Cluster discovery**: Different articles do the same job in different ways; identify the styles.
3. **Quantitative + qualitative**: Count features AND interpret patterns.
4. **Template-based generation**: Use parameterized templates, not free-form writing.
5. **Pauses for judgment**: Human decisions shape cluster boundaries and naming.
6. **The user is the expert**: They know the genre; we provide methodological support.
## Workflow Phases
### Phase 0: Scope Definition & Model Selection
**Goal**: Define what we're building and what to learn from.
**Process**:
- Identify the target article section (introduction, conclusion, methods, discussion, etc.)
- Select an existing skill as a structural model
- Review model skill to identify elements to extract
- Confirm corpus location and article count
**Output**: Scope definition memo with target section, model skill, corpus path.
> **Pause**: User confirms scope and model selection.
---
### Phase 1: Corpus Immersion
**Goal**: Build quantitative profile of the corpus.
**Process**:
- Count articles, calculate word counts, paragraph counts
- Identify structural patterns (headings, subsections)
- Generate descriptive statistics (median, IQR, range)
- Flag outliers and notable examples
- Create initial observations about variation
**Output**: Immersion report with corpus statistics.
> **Pause**: User reviews quantitative profile.
---
### Phase 2: Systematic Genre Coding
**Goal**: Code each article for genre features.
**Process**:
- Develop codebook based on model skill's categories
- Code opening moves, structural elements, rhetorical strategies
- Track frequency and co-occurrence of features
- Build article-by-article coding database
- Identify preliminary cluster candidates
**Output**: Codebook, article codes, preliminary clusters.
> **Pause**: User reviews codebook and sample codes.
---
### Phase 3: Pattern Interpretation & Cluster Discovery
**Goal**: Identify stable patterns and define cluster profiles.
**Process**:
- Analyze code co-occurrence patterns
- Define 3-6 cluster characteristics
- Calculate benchmarks for each cluster
- Identify signature moves and prohibited moves
- Extract exemplar quotes/passages
- Name clusters meaningfully
**Output**: Cluster profiles with benchmarks and exemplars.
> **Pause**: User confirms cluster definitions.
---
### Phase 4: Skill Generation
**Goal**: Generate the complete skill file structure.
**Process**:
- Generate `SKILL.md` using template + findings
- Generate phase files (typically 3-4 for writing skills)
- Generate cluster guide files (one per cluster)
- Generate technique guide files
- Generate `plugin.json`
- Prepare `marketplace.json` entry
**Output**: Complete skill directory structure.
> **Pause**: User reviews generated skill files.
---
### Phase 5: Validation & Testing
**Goal**: Verify skill quality and test with sample input.
**Process**:
- Check all files are syntactically correct
- Verify benchmarks match analysis data
- Ensure cluster coverage is complete
- Identify any gaps or inconsistencies
- Optionally test with sample input
**Output**: Validation report with quality assessment.
---
## Folder Structure for Analysis
```
project/
├── corpus/ # Article sections to analyze
│ ├── article-01.md
│ ├── article-02.md
│ └── ...
├── analysis/
│ ├── phase0-scope/ # Scope definition
│ ├── phase1-immersion/ # Quantitative profiling
│ ├── phase2-coding/ # Genre coding
│ ├── phase3-clusters/ # Pattern analysis
│ ├── phase4-generation/ # Generated skill files
│ └── phase5-validation/ # Quality assessment
└── output/ # Final skill plugin
└── plugins/[skill-name]/
```
## Code Categories to Track
Based on model skills, these are typical genre features to code:
### Structural Features
- Word count, paragraph count
- Presence of subsections
- Heading structure
- Position of key elements
### Opening Moves
- Phenomenon-led, stakes-led, theory-led, case-led, question-led
- First sentence type
- Hook strategy
### Rhetorical Moves
- Gap identification
- Contribution claims
- Limitations
- Future directions
- Callbacks (for conclusions)
### Citation Patterns
- Citation density
- Integration style (parenthetical, author-subject, quote-then-cite)
- Anchor sources vs. supporting citations
### Linguistic Features
- Hedging level
- Temporal markers
- Transition patterns
- Key phrases
## Cluster Discovery Guidelines
### Minimum Clusters: 3
If fewer than 3 patterns emerge, the corpus may be too homogeneous or the coding scheme too coarse.
### Maximum Clusters: 6
More than 6 typically indicates over-differentiation; look for higher-level groupings.
### Cluster Naming
Name clusters by their **dominant strategy**, not their prevalence:
- "Gap-Filler" not "Cluster 1"
- "Theory-Extension" not "Common Type"
- "Problem-Driven" not "Applied Approach"
### Cluster Validation
Each cluster should have:
- At least 10% of corpus (minimum 3 articles if corpus < 30)
- Distinctive benchmark values
- Clear signature moves
- At least one exemplar article
## Template System
Phase 4 uses parameterized templates. Key parameters:
| Parameter | Source |
|-----------|--------|
| `{{skill_name}}` | Phase 0 user input |
| `{{target_section}}` | Phase 0 user input |
| `{{cluster_names}}` | Phase 3 cluster discovery |
| `{{benchmarks}}` | Phase 1-2 statistics |
| `{{opening_moves}}` | Phase 2 coding |
| `{{signature_phrases}}` | Phase 2-3 analysis |
## Technique Guides
Reference these guides for phase-specific instructions:
| Guide | Purpose |
|-------|---------|
| `phases/phase0-scope.md` | Scope definition, model selection |
| `phases/phase1-immersion.md` | Quantitative profiling |
| `phases/phase2-coding.md` | Genre coding methodology |
| `phases/phase3-interpretation.md` | Cluster discovery |
| `phases/phase4-generation.md` | Skill file generation |
| `phases/phase5-validation.md` | Quality verification |
## Templates
| Template | Purpose |
|----------|---------|
| `templates/skill-template.md` | Main SKILL.md structure |
| `templates/phase-template.md` | Phase file structure |
| `templates/cluster-template.md` | Cluster profile structure |
| `templates/technique-template.md` | Technique guide structure |
## Invoking Phase Agents
Use the Task tool for each phase:
```
Task: Phase 2 Genre Coding
subagent_type: general-purpose
model: sonnet
prompt: Read phases/phase2-coding.md and execute for [user's project]. Corpus is in [location]. Model skill is [skill name].
```
## Model Recommendations
| Phase | Model | Rationale |
|-------|-------|-----------|
| **Phase 0**: Scope | **Sonnet** | Planning, structural decisions |
| **Phase 1**: Immersion | **Sonnet** | Counting, statistics |
| **Phase 2**: Coding | **Sonnet** | Systematic processing |
| **Phase 3**: Interpretation | **Opus** | Pattern recognition, cluster naming |
| **Phase 4**: Generation | **Opus** | Template adaptation, prose quality |
| **Phase 5**: Validation | **Sonnet** | Verification, checking |
## Starting the Process
When the user is ready to begin:
1. **Ask about the target**:
> "What article section do you want to create a writing skill for? (e.g., introduction, conclusion, discussion, methods)"
2. **Ask about the corpus**:
> "Where is your corpus of articles? How many articles do you have?"
3. **Ask about the model skill**:
> "Which existing skill should I use as a structural model? Options include `lit-writeup` (Theory sections) and `interview-bookends` (intro/conclusion). I can also review other skills if you prefer."
4. **Ask about output**:
> "What should the new skill be named? (e.g., `discussion-writer`, `methods-guide`)"
5. **Proceed with Phase 0** to formalize scope.
## Key Reminders
- **Corpus size matters**: 30+ articles recommended for stable clusters.
- **Variation is the goal**: A homogeneous corpus won't reveal clusters.
- **Human judgment required**: Cluster boundaries and names need user input.
- **Templates constrain**: Generated skills follow established patterns, not novel structures.
- **Test the output**: The best validation is using the generated skill.
- **Iteration expected**: First-pass clusters often need refinement.
This skill builds new writing skills by analyzing a corpus of article sections and turning discovered genre patterns into a structured, reusable skill. It produces phased workflows, cluster profiles, and sentence-level technique guides grounded in quantitative and qualitative analysis. The result is a practical authoring tool tailored to a target section and venue.
The skill ingests a corpus of same-type sections (introductions, discussions, methods, etc.), computes descriptive statistics, and applies systematic coding to identify recurring moves and features. It runs cluster discovery to define 3–6 distinctive writing strategies, then generates a complete skill directory: phased workflows, cluster guides, and technique templates based on measured benchmarks and exemplars. Human review points are built into every phase so users confirm scope, codes, and cluster names.
How large should the corpus be?
Aim for 30+ sections for stable clusters; minimum cluster rules allow smaller sets but expect less reliable differentiation.
Can I use mixed-section corpora?
No. The corpus should contain only one section type to keep feature coding and cluster discovery coherent.
How many clusters will be created?
Typically 3–6 clusters. Fewer than 3 suggests homogeneity or a coarse codebook; more than 6 risks over-differentiation.