home / skills / dkyazzentwatwa / chatgpt-skills / topic-modeler
This skill helps you extract topics from text collections using LDA, generate keywords, classify documents, and visualize results for insights.
npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill topic-modelerReview the files below or copy the command above to add this skill to your agents.
---
name: topic-modeler
description: Extract topics from text collections using LDA (Latent Dirichlet Allocation) with keyword extraction and topic visualization.
---
# Topic Modeler
Extract topics from text collections using LDA.
## Features
- **LDA Topic Modeling**: Latent Dirichlet Allocation
- **Topic Keywords**: Extract representative keywords per topic
- **Document Classification**: Assign documents to topics
- **Visualization**: Topic word clouds and distributions
- **Coherence Scores**: Evaluate topic quality
## CLI Usage
```bash
python topic_modeler.py --input documents.csv --column text --topics 5 --output topics.json
```
## Dependencies
- gensim>=4.3.0
- nltk>=3.8.0
- pandas>=2.0.0
- matplotlib>=3.7.0
- wordcloud>=1.9.0
This skill extracts coherent topics from collections of text using Latent Dirichlet Allocation (LDA) and provides keyword summaries, document assignments, and visualizations. It bundles preprocessing, keyword extraction, coherence scoring, and plot-ready outputs so you can turn raw text into actionable topic insights quickly. The tool is implemented in Python and designed for reproducible, auditable topic analysis.
The skill tokenizes and preprocesses documents, builds a dictionary and corpus, then fits an LDA model to discover latent topics. It returns top keywords per topic, assigns topic probabilities to each document, and computes coherence scores to help you evaluate model quality. Optional visualization modules produce word clouds and topic distribution plots for interpretation and reporting.
What inputs does the skill expect?
A CSV or tabular file with a text column; you specify the column name. The script also accepts plain lists of documents when used as a library.
How do I choose the number of topics?
Run models across a range of topic counts and compare coherence scores and human interpretability to pick the best balance.
What visualizations are provided?
Word clouds for each topic and topic distribution plots across documents; outputs are matplotlib and wordcloud objects or image files.