Add Memory Tracer #4181

ymyjl · 2022-12-20T14:18:55Z

PR types

New features

PR changes

Others

Description

New feature: add trainer memory tracer
A helper class that tracks cpu and gpu memory.
When a stage completes, it can pass metrics dict to update with the memory metrics gathered during this stage.

paddle-bot · 2022-12-20T14:18:59Z

Thanks for your contribution!

codecov · 2022-12-20T14:31:04Z

Codecov Report

Merging #4181 (00cb04c) into develop (ec30226) will increase coverage by 2.40%.
The diff coverage is 16.36%.

@@             Coverage Diff             @@
##           develop    #4181      +/-   ##
===========================================
+ Coverage    33.95%   36.35%   +2.40%     
===========================================
  Files          405      419      +14     
  Lines        56841    59168    +2327     
===========================================
+ Hits         19302    21513    +2211     
- Misses       37539    37655     +116

Impacted Files	Coverage Δ
paddlenlp/trainer/trainer.py	`11.24% <0.00%> (-0.24%)`	⬇️
paddlenlp/trainer/trainer_utils.py	`29.58% <15.38%> (-5.24%)`	⬇️
paddlenlp/utils/import_utils.py	`80.82% <33.33%> (+38.63%)`	⬆️
paddlenlp/trainer/training_args.py	`40.30% <100.00%> (+0.22%)`	⬆️
paddlenlp/__init__.py	`19.76% <0.00%> (-10.54%)`	⬇️
paddlenlp/transformers/auto/modeling.py	`71.88% <0.00%> (-4.98%)`	⬇️
paddlenlp/experimental/ernie_model.py	`32.43% <0.00%> (-0.91%)`	⬇️
paddlenlp/transformers/model_utils.py	`73.10% <0.00%> (-0.22%)`	⬇️
paddlenlp/transformers/feature_extraction_utils.py	`27.02% <0.00%> (-0.19%)`	⬇️
... and 42 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ZHUI · 2022-12-22T06:58:32Z

paddlenlp/trainer/trainer_utils.py

+
+        if self.paddle is not None:
+            # self.torch.cuda.reset_peak_memory_stats()?
+            self.paddle.device.cuda.empty_cache()


这个API是不是没有？

self.torch.cuda.reset_peak_memory_stats()?

对，我这里查了一下，没找见

ZHUI · 2022-12-22T10:34:50Z

还有几个问题：

paddle缺失的一些API看能否罗列一下
多卡情况下是只监控一张卡吗？
纯CPU情况下有试过吗？

ZHUI

LGTM

ZHUI · 2022-12-27T02:40:09Z

注意修改还原 run_seq_cls.py

ZHUI · 2022-12-28T02:15:49Z

model_zoo/ernie-1.0/finetune/run_seq_cls.py

        preds = paddle.to_tensor(preds)
        label = paddle.to_tensor(p.label_ids)

-        probs = F.softmax(preds, axis=1)


这一行为何删掉？

commit时候自动报错的，没用上这个变量

ZHUI

LGTM

ZHUI · 2022-12-28T06:45:49Z

paddlenlp/trainer/training_args.py

    )
+    skip_memory_metrics: bool = field(
+        default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
+    )


https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md#trainingarguments-%E5%8F%82%E6%95%B0%E4%BB%8B%E7%BB%8D

这里加上中文使用文档。

ymyjl added 2 commits December 20, 2022 09:41

Add memory tracer

a939087

add memory trace

5270830

ymyjl changed the title ~~Yj paddle~~ Add Memory Tracer Dec 20, 2022

Merge branch 'develop' into yj_paddle

b9ee407

ZHUI reviewed Dec 22, 2022

View reviewed changes

ZHUI previously approved these changes Dec 27, 2022

View reviewed changes

fix code bug

1736d2a

ymyjl dismissed ZHUI’s stale review via 1736d2a December 27, 2022 08:01

restore run_seq_cls

00cb04c

ZHUI reviewed Dec 28, 2022

View reviewed changes

ZHUI approved these changes Dec 28, 2022

View reviewed changes

ZHUI merged commit 70ca8f8 into PaddlePaddle:develop Dec 28, 2022

ZHUI mentioned this pull request Jan 12, 2023

PaddleNLP 2.5.0 Release Note Candidate #4439

Closed

Add Memory Tracer #4181

Add Memory Tracer #4181

Uh oh!

Conversation

ymyjl commented Dec 20, 2022

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Dec 20, 2022

Uh oh!

codecov bot commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ZHUI Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

ymyjl Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

ZHUI commented Dec 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

ZHUI commented Dec 27, 2022

Uh oh!

ZHUI Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

ymyjl Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

ZHUI Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

ZHUI Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Dec 20, 2022 •

edited

Loading

ZHUI commented Dec 22, 2022 •

edited

Loading