This project implements a Mixture of Experts (MoE) model with a focus on efficient routing and expert utilization. The implementation includes:
- A flexible MoE router that can handle different numbers of experts and routing parameters
- Expert layers implemented as feed-forward networks
- A complete MoE layer that combines routing and expert computation
- A transformer block that uses MoE for the feed-forward layer
- Comprehensive benchmarking and visualization utilities
.
├── src/ # Main source code
│ ├── models/ # Model implementations
│ ├── utils/ # Utility functions
│ └── triton/ # Triton optimizations
├── data/ # Dataset and data processing
├── configs/ # Configuration files
├── docs/ # Documentation
├── assets/ # Visualizations and diagrams
└── tests/ # Test suite
-
Efficient Routing Algorithm
- Top-k expert selection
- Capacity factor for load balancing
- Optional router jitter for improved training
- Sparse expert selection for computational efficiency
-
Expert Layers
- Configurable dimensions
- Flexible activation functions
- Batch processing support
-
Visualization and Analysis
- Expert utilization tracking
- Performance benchmarking
- Parameter comparison analysis
- Memory usage profiling
Expert utilization patterns with k=2 and capacity factor=1.5
Performance comparison across different configurations
Impact of different parameters on model performance
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Linux/Mac
# or
.\venv\Scripts\activate # On Windows
- Install dependencies:
pip install -r requirements.txt
python src/benchmark.py
This will:
- Run comprehensive benchmarks with different configurations
- Generate expert utilization plots
- Create performance comparison charts
- Save results in the
assets/
directory
The project includes several visualization tools:
-
Expert Utilization Plotter
- Tracks expert selection patterns
- Visualizes load balancing
- Generates heatmaps of expert usage
-
Performance Analyzer
- Creates comparative charts
- Tracks memory usage
- Analyzes computational efficiency
-
Parameter Comparison Tool
- Compares different configurations
- Visualizes trade-offs
- Helps optimize hyperparameters
python src/profile_moe.py
This will:
- Profile memory usage and performance
- Generate detailed reports
- Create visualization of bottlenecks
- Update profiling documentation
-
✅ Core Implementation
- ✅ Router implementation
- ✅ Expert implementation
- ✅ MoE layer implementation
- ✅ Visualization tools
-
✅ Evaluation Framework
- ✅ Benchmarking utilities
- ✅ Expert utilization analysis
- ✅ Performance profiling
- ✅ Visualization suite
-
🔄 Advanced Features (In Progress)
- 🔄 Memory optimization
- 🔄 Quantization support
- 🔄 Multi-GPU support
- 🔄 Enhanced visualization tools
Contributions are welcome! Please feel free to submit issues and enhancement requests.
This project is licensed under the MIT License - see the LICENSE file for details.