Skip to content

Conversation

@ymyjl
Copy link
Contributor

@ymyjl ymyjl commented Dec 20, 2022

PR types

New features

PR changes

Others

Description

New feature: add trainer memory tracer
A helper class that tracks cpu and gpu memory.
When a stage completes, it can pass metrics dict to update with the memory metrics gathered during this stage.

@paddle-bot
Copy link

paddle-bot bot commented Dec 20, 2022

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Dec 20, 2022

Codecov Report

Merging #4181 (00cb04c) into develop (ec30226) will increase coverage by 2.40%.
The diff coverage is 16.36%.

@@             Coverage Diff             @@
##           develop    #4181      +/-   ##
===========================================
+ Coverage    33.95%   36.35%   +2.40%     
===========================================
  Files          405      419      +14     
  Lines        56841    59168    +2327     
===========================================
+ Hits         19302    21513    +2211     
- Misses       37539    37655     +116     
Impacted Files Coverage Δ
paddlenlp/trainer/trainer.py 11.24% <0.00%> (-0.24%) ⬇️
paddlenlp/trainer/trainer_utils.py 29.58% <15.38%> (-5.24%) ⬇️
paddlenlp/utils/import_utils.py 80.82% <33.33%> (+38.63%) ⬆️
paddlenlp/trainer/training_args.py 40.30% <100.00%> (+0.22%) ⬆️
paddlenlp/__init__.py 19.76% <0.00%> (-10.54%) ⬇️
paddlenlp/transformers/auto/modeling.py 71.88% <0.00%> (-4.98%) ⬇️
paddlenlp/experimental/ernie_model.py 32.43% <0.00%> (-0.91%) ⬇️
paddlenlp/transformers/model_utils.py 73.10% <0.00%> (-0.22%) ⬇️
paddlenlp/transformers/feature_extraction_utils.py 27.02% <0.00%> (-0.19%) ⬇️
... and 42 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ymyjl ymyjl changed the title Yj paddle Add Memory Tracer Dec 20, 2022

if self.paddle is not None:
# self.torch.cuda.reset_peak_memory_stats()?
self.paddle.device.cuda.empty_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个API是不是没有?

self.torch.cuda.reset_peak_memory_stats()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,我这里查了一下,没找见

@ZHUI
Copy link
Contributor

ZHUI commented Dec 22, 2022

还有几个问题:

  1. paddle缺失的一些API看能否罗列一下
  2. 多卡情况下是只监控一张卡吗?
  3. 纯CPU情况下有试过吗?

ZHUI
ZHUI previously approved these changes Dec 27, 2022
Copy link
Contributor

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI
Copy link
Contributor

ZHUI commented Dec 27, 2022

注意修改还原 run_seq_cls.py

preds = paddle.to_tensor(preds)
label = paddle.to_tensor(p.label_ids)

probs = F.softmax(preds, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行为何删掉?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit时候自动报错的,没用上这个变量

Copy link
Contributor

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

)
skip_memory_metrics: bool = field(
default=True, metadata={"help": "Whether or not to skip adding of memory profiler reports to metrics."}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加上中文使用文档。

@ZHUI ZHUI merged commit 70ca8f8 into PaddlePaddle:develop Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants