Interactive tool for analyzing attention patterns in transformer models, helping understand how LLMs process different types of text inputs.
- 🎯 Layer-wise attention visualization - Explore attention patterns across transformer layers
- 🔥 Interactive attention heatmaps - See which tokens attend to which other tokens
- 📊 Token importance scoring - Identify the most important tokens in your text
- 📈 Layer comparison analysis - Compare attention entropy and patterns across layers
- 🧠 Attention head analysis - Understand how different heads specialize
- 🌊 Attention flow diagrams - Visualize attention connections as network graphs
- 📱 User-friendly Streamlit interface - No coding required to use
git clone https://github.com/YOUR_USERNAME/llm-attention-visualizer.git
cd llm-attention-visualizer
pip install -r requirements.txt
streamlit run app.py
The app will open in your browser at http://localhost:8501
- Select a Model: Choose from DistilBERT, BERT, or GPT-2 models
- Enter Text: Use sample texts or enter your own
- Choose Analysis: Pick from 5 different visualization types:
- Attention Heatmap
- Token Importance
- Layer Comparison
- Head Patterns
- Attention Flow
- Explore: Interact with the visualizations and adjust layers/heads
Interactive heatmap showing which tokens attend to which others
- Hover for detailed attention weights
- Select specific layers and heads
- Normalize attention for better comparison
Bar chart showing total attention received by each token
- Identifies the most "important" tokens in context
- Color-coded by attention strength
Line plots comparing attention patterns across layers
- Attention entropy (how spread out attention is)
- Maximum attention values
- Helps understand model depth utilization
Compare different attention heads within a layer
- Self-attention vs cross-attention patterns
- Head specialization metrics
- Entropy and focus measurements
Network visualization of strong attention connections
- Nodes represent tokens
- Edges show attention flow above threshold
- Circular layout for clear visualization
src/
├── attention_extractor.py # Model loading and attention extraction
├── visualizer.py # Plotly visualization functions
└── __init__.py # Package initialization
app.py # Main Streamlit application
requirements.txt # Python dependencies
examples/ # Sample texts for testing
- DistilBERT (
distilbert-base-uncased
) - Lightweight, fast - BERT (
bert-base-uncased
) - Classic transformer model - DistilGPT-2 (
distilgpt2
) - Decoder-only architecture - GPT-2 (
gpt2
) - Generative pre-trained transformer
Attention mechanisms allow models to focus on relevant parts of the input when processing each token. High attention weights indicate strong relationships between tokens.
- High self-attention: Token attending to itself
- Sequential attention: Following word order patterns
- Semantic attention: Focus on semantically related tokens
- Positional patterns: Position-based attention in early layers
- Built with Streamlit for the web interface
- Hugging Face Transformers for model loading
- PyTorch for tensor operations
- Plotly for interactive visualizations
- Caching for improved performance
- Education: Understand how transformers work
- Research: Analyze model behavior on specific inputs
- Debugging: Identify attention issues in fine-tuned models
- Interpretability: Explain model decisions through attention
The app includes several pre-loaded examples:
- Simple sentences for basic analysis
- Pronoun resolution examples
- Complex sentences with multiple clauses
- Questions and technical text
Contributions welcome! Ideas for improvements:
- Support for more model architectures
- Additional visualization types
- Batch processing capabilities
- Export functionality for visualizations
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for the Transformers library
- Streamlit for the amazing web framework
- The attention mechanism research community
Built with ❤️ for understanding transformer models