|
8 | 8 | [](https://pypi.org/project/tmplot) |
9 | 9 | [](https://github.com/maximtrp/tmplot/issues) |
10 | 10 |
|
11 | | -**tmplot** is a Python package for analysis and visualization of topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topic distances and a number of algorithms for calculating scatter coordinates of topics. It can be used to select closest and stable topics across multiple models. |
| 11 | +**tmplot** is a comprehensive Python package for **topic modeling analysis and visualization**. Built for data scientists and researchers, it provides powerful interactive reports and advanced analytics that extend beyond traditional LDAvis/pyLDAvis capabilities. |
| 12 | + |
| 13 | +**Analyze** • **Visualize** • **Compare** multiple topic models with ease |
12 | 14 |
|
13 | 15 |  |
14 | 16 |
|
15 | | -## Features |
| 17 | +## Key Features |
| 18 | + |
| 19 | +### Interactive Visualization |
| 20 | + |
| 21 | +- **Topic scatter plots** with customizable coordinates and sizing |
| 22 | +- **Term probability charts** with relevance weighting |
| 23 | +- **Document analysis** showing top documents per topic |
| 24 | +- **Interactive reports** with real-time parameter adjustment |
16 | 25 |
|
17 | | -- Supported models: |
| 26 | +### Advanced Analytics |
18 | 27 |
|
19 | | - - [tomotopy](https://bab2min.github.io/tomotopy/): `LDAModel`, `LLDAModel`, `CTModel`, `DMRModel`, `HDPModel`, `PTModel`, `SLDAModel`, `GDMRModel` |
20 | | - - [gensim](https://radimrehurek.com/gensim/): `LdaModel`, `LdaMulticore` |
21 | | - - [bitermplus](https://github.com/maximtrp/bitermplus): `BTM` |
| 28 | +- **Topic stability analysis** across multiple model runs |
| 29 | +- **Model comparison** with sophisticated distance metrics |
| 30 | +- **Saliency calculations** for term importance |
| 31 | +- **Entropy metrics** for model optimization |
22 | 32 |
|
23 | | -- Supported distance metrics: |
| 33 | +### Model Support |
24 | 34 |
|
25 | | - - Kullback-Leibler (symmetric and non-symmetric) divergence |
26 | | - - Jenson-Shannon divergence |
27 | | - - Jeffrey's divergence |
28 | | - - Hellinger distance |
29 | | - - Bhattacharyya distance |
30 | | - - Total variation distance |
31 | | - - Jaccard inversed index |
| 35 | +- **[tomotopy](https://bab2min.github.io/tomotopy/)**: `LDAModel`, `LLDAModel`, `CTModel`, `DMRModel`, `HDPModel`, `PTModel`, `SLDAModel`, `GDMRModel` |
| 36 | +- **[gensim](https://radimrehurek.com/gensim/)**: `LdaModel`, `LdaMulticore` |
| 37 | +- **[bitermplus](https://github.com/maximtrp/bitermplus)**: `BTM` |
32 | 38 |
|
33 | | -- Supported [algorithms](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) for calculating topics scatter coordinates: |
| 39 | +### Distance Metrics |
34 | 40 |
|
35 | | - - t-SNE |
36 | | - - SpectralEmbedding |
37 | | - - MDS |
38 | | - - LocallyLinearEmbedding |
39 | | - - Isomap |
| 41 | +- **Kullback-Leibler** (symmetric & non-symmetric) |
| 42 | +- **Jensen-Shannon divergence** |
| 43 | +- **Jeffrey's divergence** |
| 44 | +- **Hellinger & Bhattacharyya distances** |
| 45 | +- **Total variation distance** |
| 46 | +- **Jaccard index** |
| 47 | + |
| 48 | +### Dimensionality Reduction |
| 49 | + |
| 50 | +- **t-SNE** • **SpectralEmbedding** • **MDS** |
| 51 | +- **LocallyLinearEmbedding** • **Isomap** |
40 | 52 |
|
41 | 53 | ## Donate |
42 | 54 |
|
43 | 55 | If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software. |
44 | 56 |
|
45 | 57 | <a href="https://www.buymeacoffee.com/maximtrp" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a> |
46 | 58 |
|
47 | | -## Installation |
| 59 | +## Quick Start |
48 | 60 |
|
49 | | -The package can be installed from PyPi: |
| 61 | +### Installation |
50 | 62 |
|
51 | 63 | ```bash |
| 64 | +# From PyPI (recommended) |
52 | 65 | pip install tmplot |
53 | | -``` |
54 | 66 |
|
55 | | -Or directly from this repository: |
56 | | - |
57 | | -```bash |
| 67 | +# Development version |
58 | 68 | pip install git+https://github.com/maximtrp/tmplot.git |
59 | 69 | ``` |
60 | 70 |
|
61 | | -## Dependencies |
| 71 | +### Basic Usage |
| 72 | + |
| 73 | +```python |
| 74 | +import tmplot as tmp |
| 75 | + |
| 76 | +# Load your topic model and documents |
| 77 | +model = your_fitted_model # tomotopy, gensim, or bitermplus |
| 78 | +docs = your_documents |
62 | 79 |
|
63 | | -- `numpy` |
64 | | -- `scipy` |
65 | | -- `scikit-learn` |
66 | | -- `pandas` |
67 | | -- `altair` |
68 | | -- `ipywidgets` |
69 | | -- `tomotopy`, `gensim`, and `bitermplus` (optional) |
| 80 | +# Create interactive report |
| 81 | +tmp.report(model, docs=docs) |
70 | 82 |
|
71 | | -## Quick example |
| 83 | +# Or create individual visualizations |
| 84 | +coords = tmp.prepare_coords(model) |
| 85 | +tmp.plot_scatter_topics(coords, size_col='size') |
| 86 | +``` |
| 87 | + |
| 88 | +## Advanced Examples |
| 89 | + |
| 90 | +### Compare Multiple Models |
72 | 91 |
|
73 | 92 | ```python |
74 | | -# Importing packages |
75 | 93 | import tmplot as tmp |
76 | | -import pickle as pkl |
77 | | -import pandas as pd |
78 | 94 |
|
79 | | -# Reading a model from a file |
80 | | -with open('data/model.pkl', 'rb') as file: |
81 | | - model = pkl.load(file) |
| 95 | +# Find stable topics across multiple models |
| 96 | +models = [model1, model2, model3, model4] |
| 97 | +closest_topics, distances = tmp.get_closest_topics(models) |
| 98 | +stable_topics, stable_distances = tmp.get_stable_topics(closest_topics, distances) |
| 99 | +``` |
| 100 | + |
| 101 | +### Model Optimization |
| 102 | + |
| 103 | +```python |
| 104 | +# Calculate entropy for model selection |
| 105 | +entropy_score = tmp.entropy(phi_matrix) |
82 | 106 |
|
83 | | -# Reading documents from a file |
84 | | -docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel() |
| 107 | +# Analyze topic stability |
| 108 | +saliency = tmp.get_salient_terms(phi, theta) |
| 109 | +``` |
85 | 110 |
|
86 | | -# Plotting topics as a scatter plot |
87 | | -topics_coords = tmp.prepare_coords(model) |
88 | | -tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label') |
| 111 | +### Custom Visualizations |
89 | 112 |
|
90 | | -# Plotting terms probabilities |
91 | | -terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1) |
92 | | -tmp.plot_terms(terms_probs) |
| 113 | +```python |
| 114 | +# Create topic distance matrix with different metrics |
| 115 | +topic_dists = tmp.get_topics_dist(phi, method='jensen-shannon') |
93 | 116 |
|
94 | | -# Running report interface |
95 | | -tmp.report(model, docs=docs, width=250) |
| 117 | +# Generate coordinates with custom algorithm |
| 118 | +coords = tmp.get_topics_scatter(topic_dists, theta, method='tsne') |
| 119 | +tmp.plot_scatter_topics(coords, topic=3) # Highlight topic 3 |
96 | 120 | ``` |
97 | 121 |
|
98 | | -You can find more examples in the [tutorial](https://tmplot.readthedocs.io/en/latest/tutorial.html). |
| 122 | +## Documentation & Examples |
| 123 | + |
| 124 | +- **[Complete Tutorial](https://tmplot.readthedocs.io/en/latest/tutorial.html)** - Step-by-step guide |
| 125 | +- **[API Reference](https://tmplot.readthedocs.io/)** - Full documentation |
| 126 | +- **[Example Notebooks](https://github.com/maximtrp/tmplot/tree/main/examples)** - Jupyter examples |
| 127 | + |
| 128 | +## Requirements |
| 129 | + |
| 130 | +**Core dependencies:** `numpy`, `scipy`, `scikit-learn`, `pandas`, `altair`, `ipywidgets` |
| 131 | +**Optional models:** `tomotopy`, `gensim`, `bitermplus` |
0 commit comments