add timers and performance metrics #688

awan-10 · 2023-08-16T19:19:11Z

This PR adds timers and logging functions to properly calculate latencies, FLOPs, and achieved bandwidth for various phases of the RLHF training step.

lekurile

LGTM, left a few comments about how to potentially clean up a few items. Thanks!

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py

applications/DeepSpeed-Chat/training/utils/perf.py

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py

applications/DeepSpeed-Chat/training/utils/perf.py

* update flops calculation * fix and verify flops. add for step 1 as well. --------- Co-authored-by: Ammar Ahmad Awan <[email protected]>

Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>

awan-10 and others added 8 commits August 16, 2023 19:15

add timers

7a82c7a

Move import time and precommit

684d2aa

Match generate time to HE eval

7b28f23

add flops counter/printer.

c7b3ac0

Modify/clean up tflops func

a069ec5

improve logging.

4a22383

Merge branch 'master' into amawa/add-timers-flops

3b2f9e3

fix name.

41fe15f

awan-10 marked this pull request as ready for review August 18, 2023 19:05

awan-10 requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners August 18, 2023 19:05

awan-10 added 2 commits August 18, 2023 19:43

fix format.

1b25121

undo debugging/

3a69cb2

lekurile approved these changes Aug 18, 2023

View reviewed changes

yaozhewei reviewed Aug 18, 2023

View reviewed changes

applications/DeepSpeed-Chat/training/utils/perf.py Outdated Show resolved Hide resolved

yaozhewei reviewed Aug 18, 2023

View reviewed changes

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py Outdated Show resolved Hide resolved

yaozhewei reviewed Aug 18, 2023

View reviewed changes

applications/DeepSpeed-Chat/training/utils/perf.py Outdated Show resolved Hide resolved

yaozhewei reviewed Aug 18, 2023

View reviewed changes

applications/DeepSpeed-Chat/training/utils/perf.py Outdated Show resolved Hide resolved

take Lev's feedback.

f5a4dc4

awan-10 and others added 4 commits August 18, 2023 22:53

fix performance calculations.

d5fbde1

Merge branch 'master' into amawa/add-timers-flops

94c249f

update flops calculation (#702)

d418550

* update flops calculation * fix and verify flops. add for step 1 as well. --------- Co-authored-by: Ammar Ahmad Awan <[email protected]>

remove unused timer.

7582c33

awan-10 merged commit 81a8521 into master Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add timers and performance metrics #688

add timers and performance metrics #688

Uh oh!

awan-10 commented Aug 16, 2023 •

edited

Loading

Uh oh!

lekurile left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

add timers and performance metrics #688

add timers and performance metrics #688

Uh oh!

Conversation

awan-10 commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lekurile left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

awan-10 commented Aug 16, 2023 •

edited

Loading