You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Llama-2 fine-tuning scripts and configuration for ZenFlow
- Introduced `finetune_llama.py` for fine-tuning the Llama-2 model using DeepSpeed and ZenFlow.
- Added `finetune_llama.sh` for automated training setup with environment variables and DeepSpeed command.
- Added `zf_config.json` example for DeepSpeed configuration with ZenFlow optimizations.
Signed-off-by: Tingfeng Lan <[email protected]>
Co-authored-by: Yusen Wu <[email protected]>
This project demonstrates how to fine-tune a [Llama-2](https://huggingface.co/meta-llama) model using [DeepSpeed](https://www.deepspeed.ai/) with **ZenFlow**, a stall-free offloading engine for large-scale model training.
5
+
6
+
## Quick Start
7
+
8
+
1.**Install dependencies**
9
+
10
+
```bash
11
+
pip install -r requirements.txt
12
+
```
13
+
14
+
2.**Configure training**
15
+
16
+
Edit `zf_config.json` to enable ZenFlow:
17
+
18
+
```json
19
+
"zero_optimization": {
20
+
"stage": 2,
21
+
"offload_optimizer": {
22
+
"device": "cpu",
23
+
"pin_memory": true
24
+
},
25
+
"zenflow": {
26
+
"topk_ratio": 0.1,
27
+
"update_interval": 4,
28
+
"full_warm_up_rounds": 0,
29
+
"overlap_step": true
30
+
}
31
+
}
32
+
```
33
+
34
+
3.**Run fine-tuning**
35
+
36
+
```bash
37
+
bash finetune_llama.sh
38
+
```
39
+
40
+
This runs LLaMA-2 fine-tuning using DeepSpeed + ZenFlow, saving checkpoints to `./alpaca_output`.
41
+
42
+
## Example Output
43
+
44
+
Below is a sample log showing step time and loss values. You can see significant speedup after the first full step:
45
+
46
+
```
47
+
ZenFlowCPUAdam initialized with overlap step.
48
+
Step 5, Loss: 1.2599, Time: 719.58ms
49
+
Step 6, Loss: 0.9847, Time: 702.81ms
50
+
Step 7, Loss: 0.6220, Time: 705.50ms
51
+
Step 8, Loss: 0.5173, Time: 1912.92ms
52
+
Step 9, Loss: 0.4557, Time: 890.60ms
53
+
Step 10, Loss: 0.3882, Time: 740.11ms
54
+
Step 11, Loss: 0.3627, Time: 731.95ms
55
+
Step 12, Loss: 0.3341, Time: 2221.18ms
56
+
Step 13, Loss: 0.2453, Time: 1061.80ms
57
+
```
58
+
59
+
ZenFlow reduces optimizer-induced stalls by overlapping CPU computation and GPU execution.
60
+
61
+
## Notes
62
+
63
+
- To change model, batch size, or epochs, modify `finetune_llama.sh`.
64
+
- All DeepSpeed and ZenFlow options are controlled via `zf_config.json`.
65
+
66
+
## Citation
67
+
68
+
To cite DeepSpeed Chat, please cite our [arxiv report](https://arxiv.org/abs/2505.12242):
69
+
70
+
```bib
71
+
@misc{lan2025zenflowenablingstallfreeoffloading,
72
+
title={ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates},
73
+
author={Tingfeng Lan and Yusen Wu and Bin Ma and Zhaoyuan Su and Rui Yang and Tekin Bicer and Dong Li and Yue Cheng},
0 commit comments