-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Benchmarking and Fine-Tuning Support for ZenFlow #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Benchmarking and Fine-Tuning Support for ZenFlow #982
Conversation
- Introduced `zf_benchmark.py` for model offloading benchmarking with DeepSpeed. - Added `output_table.py` to parse and display benchmark results in a tabular format. - Created `run_benchmark.sh` to automate benchmark runs with various configurations. Signed-off-by: Tingfeng Lan <[email protected]>
- Introduced `finetune_llama.py` for fine-tuning the Llama-2 model using DeepSpeed and ZenFlow. - Added `finetune_llama.sh` for automated training setup with environment variables and DeepSpeed command. - Added `zf_config.json` example for DeepSpeed configuration with ZenFlow optimizations. Signed-off-by: Tingfeng Lan <[email protected]> Co-authored-by: Yusen Wu <[email protected]>
ca441f5 to
0528aed
Compare
Signed-off-by: Tingfeng Lan <[email protected]>
Signed-off-by: Tingfeng Lan <[email protected]>
|
@sfc-gh-truwase Thanks for the great suggestions — I’ve applied them all! |
|
Hi @Antlera I have a question. I saw ZenFlow has CPU running parameter update. Does DeepSpeed argument Here is a link to this switch. This is a switch for CPU backend, but for CPU offload this switch should help as well. |
|
Hi @delock. Thank you for bringing this up—this is a great observation! I just tested ZenFlow currently sidesteps most contention by evenly sharding I’ve shared more detailed discussion and logs in the email thread that Tunji forwarded—please let me know if you didn’t receive them. |
Yes, please open an issue, we could discuss this in detail. Yes, leaving one might not be enough if background service needs more cores. We can discuss how to make core binding tuning more easy to use. |
This PR adds a blog post and images for ZenFlow, introducing its design, benefits, and usage. The blog explains how ZenFlow improves GPU utilization by overlapping computation and communication during offloaded training. See also: #7391 – core ZenFlow implementation. [#982](deepspeedai/DeepSpeedExamples#982) - – benchmarking and fine-tuning example. --------- Signed-off-by: Tingfeng Lan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Hongwei Chen <[email protected]>
This PR adds a blog post and images for ZenFlow, introducing its design, benefits, and usage. The blog explains how ZenFlow improves GPU utilization by overlapping computation and communication during offloaded training. See also: deepspeedai#7391 – core ZenFlow implementation. [deepspeedai#982](deepspeedai/DeepSpeedExamples#982) - – benchmarking and fine-tuning example. --------- Signed-off-by: Tingfeng Lan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Signed-off-by: lym <[email protected]>
This PR adds a blog post and images for ZenFlow, introducing its design, benefits, and usage. The blog explains how ZenFlow improves GPU utilization by overlapping computation and communication during offloaded training. See also: deepspeedai#7391 – core ZenFlow implementation. [deepspeedai#982](deepspeedai/DeepSpeedExamples#982) - – benchmarking and fine-tuning example. --------- Signed-off-by: Tingfeng Lan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Hongwei Chen <[email protected]>
Description:
This PR introduces scripts for benchmarking and fine-tuning with ZenFlow:
zf_benchmark.py: Benchmark script for evaluating offloading performance (adapted fromoffload_states.pyby @tohtana ).output_table.py: Parses and summarizes benchmark logs.run_benchmark.sh: Automates benchmark runs with configurable parameters.finetune_llama.py: Fine-tuning script for Llama-2 with DeepSpeed + ZenFlow.finetune_llama.sh: Launch script for fine-tuning with environment setup.zf_config.json: Example DeepSpeed config with ZenFlow optimizations.Note: This PR is complimentary to PR #7391 on the main repo, and should be merged with (or after) merging PR #7391.