Skip to content

Conversation

@HydrogenSulfate
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate commented Nov 18, 2024

PR Category

Performance Optimization

PR Types

Bug fixes

Description

Pcard-75624

问题描述

现有动态图/静态图组合算子大量使用了带有place信息的基础算子,如fullfull_scalarfull_with_tensor,这些算子末尾的参数place决定了首先在哪个设备上创建张量。但目前所有的组合算子调用方式中均未手动指定这些张量的place参数,导致全部默认创建在 CPU 上,

node_input_buffers_dict[next_node]->add(edge_rank.first,
edge_rank.second,
grad_output_tensor,
create_graph);

继而在backward.cc(以动态图为例)的梯度累加过程中,由于累加时会对 source 和 target Tensor 所在设备进行检测并将 target 拷贝到 source 设备上,因此会不断地触发 H2D 或者 D2H 拷贝,拖慢反向运行速度

// if src and dst are in different place, copy dst to src's place
if (dst_tensor->place() != place) {
paddle::framework::TensorCopySync(*dst_tensor, place, dst_tensor);
}

image

解决方案

本 PR 修复了所有动态图、静态图组合反向算子里的相关调用,指定了place与输入张量保持一致,避免了大量的H2D/D2H拷贝操作。

上方为修复后的timeline,下方为修复前的timeline,可以看到修复后基本看不到红色的拷贝操作
image

上方为修复前的H2D/D2H拷贝耗时情况,下方为修复后的拷贝耗时情况,可以看到修复后原本异常的拷贝耗时,恢复正常
image

案例测试结果:

  1. Allen_Cahn(二阶微分方程+一阶反向,性能提升 100%,显存增加1.2%)

image

image

image

  1. DeePMD-kit DPA2(21s -> 18s,提升 14%)

  2. ldc_2d(二阶微分方程+一阶反向,性能提升63%)

image

TODO:修复静态图前向组合composite.h中的place问题,下一个PR修复

@paddle-bot
Copy link

paddle-bot bot commented Nov 18, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@HydrogenSulfate HydrogenSulfate changed the title [Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy [Prim/Comp] Fix place setting to avoid redundant H2D/D2H copy for backward decomposition Nov 19, 2024
@HydrogenSulfate HydrogenSulfate merged commit 9bba922 into PaddlePaddle:develop Nov 19, 2024
27 of 28 checks passed
@HydrogenSulfate HydrogenSulfate deleted the fix_prim_comp_place branch November 19, 2024 06:10
github-merge-queue bot pushed a commit to deepmodeling/deepmd-kit that referenced this pull request Dec 17, 2024
Summary of this PR:

1. upload DPA-1 related code
2. merge much develop code
3. add all eager composite operators except `softmax_grad`,
`p_norm_grad`, `split_grad`, and `concat_grad` to the composite operator
blacklist(<https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148>)
to significantly improve model execution speed (reducing the time taken
from 100% more than PyTorch to about 10% to 15% more).


related PR: lanpa/tensorboardX#728


### Training curve:


![training_curves_comparison_eager_opt](https://github.com/user-attachments/assets/3b71fc99-5abf-4353-a61a-38737d3c7f2c)

### Accuracy test(left: paddle, right: torch):


![image](https://github.com/user-attachments/assets/a42b4bfd-c0f8-4eb8-85eb-ff1adf981dbb)


Ralated optimization of Paddle framework:
- [x] PaddlePaddle/Paddle#69349
- [x] PaddlePaddle/Paddle#69333
- [x] PaddlePaddle/Paddle#69479
- [x] PaddlePaddle/Paddle#69515
- [x] PaddlePaddle/Paddle#69487
- [x] PaddlePaddle/Paddle#69661
- [x] PaddlePaddle/Paddle#69660
- [x] PaddlePaddle/Paddle#69596
- [x] PaddlePaddle/Paddle#69556

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced several new classes for molecular descriptors, including
`DescrptDPA1`, `DescrptBlockSeAtten`, and `LayerNorm`, enhancing the
modeling capabilities for molecular simulations.
- Added new JSON configuration files for model parameters and multitask
models related to water simulations.
- Implemented new test classes for validating the functionality of the
`DPAtomicModel` and various descriptor classes.
- Added new test classes for evaluating denoising models, including
`TestDenoiseModelDPA1` and `TestDenoiseModelDPA2`.
- Enhanced the `ModelWrapper` class to clarify the handling of model
parameters and state management.

- **Bug Fixes**
- Improved internal logic for handling model state saving and loading,
ensuring consistency in outputs.

- **Documentation**
- Enhanced type hints and return annotations across various classes and
methods for better clarity.

- **Tests**
- Expanded the testing framework with new test cases for denoising
models and descriptor functionalities, ensuring robust validation of
features.
- Activated previously skipped tests for energy models, improving test
coverage.
- Enhanced multitask training tests with new configuration handling and
test classes.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
github-merge-queue bot pushed a commit to deepmodeling/deepmd-kit that referenced this pull request Dec 25, 2024
Support DPA-2 in paddle backend. This PR will be updated after #4414 is
merged.

### Training curve:


![training_curves_comparison_dpa2](https://github.com/user-attachments/assets/29bdeffa-cf2d-4586-afcf-7df0569997c3)



### Accuracy test(left: paddle, right: torch):


![image](https://github.com/user-attachments/assets/5bff55f3-1c39-4b95-93f0-68783e794716)


Ralated optimization of Paddle framework:
- [x] PaddlePaddle/Paddle#69349
- [x] PaddlePaddle/Paddle#69333
- [x] PaddlePaddle/Paddle#69479
- [x] PaddlePaddle/Paddle#69515
- [x] PaddlePaddle/Paddle#69487
- [x] PaddlePaddle/Paddle#69661
- [x] PaddlePaddle/Paddle#69660
- [x] PaddlePaddle/Paddle#69596
- [x] PaddlePaddle/Paddle#69556

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new classes for molecular descriptors: `DescrptDPA2`,
`DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`.
- Added new functions for tensor operations and descriptor management,
enhancing the capabilities of the module.
- Updated JSON configurations for multitask models to refine selection
criteria and data paths.

- **Bug Fixes**
- Improved error handling and parameter validation across various
descriptor classes.

- **Documentation**
- Enhanced test coverage for new descriptor functionalities and
configurations.

- **Tests**
- Added new test classes to validate the functionality of `DescrptDPA2`
and multitask training scenarios.
- Expanded test capabilities for descriptor classes based on installed
dependencies.
- Updated existing tests to support new configurations and
functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants