Skip to content

Commit 218221a

Browse files
authored
Mixture-of-Experts intro (#888)
1 parent 27b6dfa commit 218221a

13 files changed

+1333
-228
lines changed

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ appendix-D/01_main-chapter-code/3.pdf
1212
appendix-E/01_main-chapter-code/loss-plot.pdf
1313

1414
ch04/04_gqa/kv_bytes_vs_context_length.pdf
15-
ch05/05_mla/kv_bytes_vs_context_length.pdf
16-
ch06/06_swa/kv_bytes_vs_context_length.pdf
15+
ch04/05_mla/kv_bytes_vs_context_length.pdf
16+
ch04/06_swa/kv_bytes_vs_context_length.pdf
17+
ch04/07_moe/ffn_vs_moe.pdf
1718

1819
ch05/01_main-chapter-code/loss-plot.pdf
1920
ch05/01_main-chapter-code/temperature-plot.pdf

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ Several folders contain optional materials as a bonus for interested readers:
172172
- [Grouped-Query Attention](ch04/04_gqa)
173173
- [Multi-Head Latent Attention](ch04/05_mla)
174174
- [Sliding Window Attention](ch04/06_swa)
175+
- [Mixture-of-Experts (MoE)](ch04/07_moe)
175176
- **Chapter 5: Pretraining on unlabeled data:**
176177
- [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
177178
- [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)

ch04/06_swa/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,14 @@ The savings when using SWA over MHA are further shown in the plot below for diff
7171

7272
 
7373

74-
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/swa-memory/4.webp?2" alt="SWA" width="=800px" />
74+
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/swa-memory/4.webp?2" alt="SWA" width="800px" />
7575

7676
&nbsp;
7777

78-
You can reproduce these plots via:
78+
You can reproduce thi plots via:
7979

8080
```bash
81-
plot_memory_estimates_swa.py \
81+
uv run plot_memory_estimates_swa.py \
8282
--emb_dim 4096 --n_heads 48 --n_layers 36 \
8383
--batch_size 1 --dtype bf16 \
8484
--sliding_window_size 2048 --swa_ratio "5:1"

ch04/06_swa/memory_estimator_mla.py

Lines changed: 0 additions & 123 deletions
This file was deleted.

ch04/06_swa/plot_memory_estimates_mla.py

Lines changed: 0 additions & 90 deletions
This file was deleted.

0 commit comments

Comments
 (0)