Skip to content

Conversation

@Zjq9409
Copy link
Contributor

@Zjq9409 Zjq9409 commented Dec 1, 2021

PR types

Performance optimization

PR changes

OPs

Describe

使用reduce实现broadcast div 反向,相比于原始性能数据如下:

case pytorch(kernel) 优化前 优化前相比pytorch 优化后 优化后相比pytorch 加速比 优化前accuracy 优化后accuracy
[50, 128, 1000], [128, 1000] 0.46865 0.24259 优于 (48.24%) 0.33432 优于(28.66%) 0.73 2.56E+02 2.29E+05
[50, 128, 1000], [1, 128, 1000] 0.46940 0.24346 优于 (48.13%) 0.33333 优于(28.99%) 0.73 3.93E+05 8.19E+03
[16, 2048, 7, 7], [16, 2048] 0.14044 0.07819 优于 (44.32%) 0.10572 优于(24.72%) 0.74 1.28E+02 0.0
[16, 2048, 16, 16], [16, 2048, 16, 16] 0.71575 0.34497 优于 (1.07x) 0.34541 优于 (1.07x) 1.00 0.0 0.0
[16,1,513,513], [1] 0.31762 4.67214 差于 (13.71x) 0.22371 优于(29.57%) 20.88 2.44E-03 2.34E-02
[512, 896, 4, 12], [512, 896, 4, 1] 1.68353 2.82219 差于 (67.64%) 1.20446 优于(28.46%) 2.34 5.24E+05 0.0
[512, 896, 4, 12], [512, 896, 4, 1] fp16 1.17390 2.74304 差于 (1.34x) 0.78895 优于(32.79%) 3.48 0.0 0.0
[32, 12, 128, 128], [32, 1, 1, 128] fp16 0.34941 0.57034 差于 (63.23%) 0.20772 优于(40.55%) 2.75 0.0 0.0
[32, 1, 1, 128], [1, 12, 128, 1] fp16 0.38124 0.4983 差于 (30.71%) 0.25034 优于(34.34%) 1.99 0.0 0.0

其中case1、2、3相比于原始dev分支优化的比例有所下降,但是其他配置均有很大提升

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 1, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 1, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Zjq9409 Zjq9409 changed the title add broadcast_div_bw implementation of broadcast div backward by reduce Dec 7, 2021
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释已删除

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float和double代码重复

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该函数目前已经删除

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,float和double重复

Copy link
Contributor

@ZzSean ZzSean Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase一下最新代码

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个头文件有添加的必要吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经删除

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同sub,default_elementwise_div_grad和elementwise_div_grad cpu代码重复

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t_y表示什么意思?变量名最好能直观体现出含义

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@ZzSean ZzSean Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数命名遵循大驼峰规则,不要全大写

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.该函数已经修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释位置调整一下,太不显眼了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里求的是dx,函数名确实grady?不合适吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该函数已经修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名和注释位置同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除无用代码

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

JZZ-NOTE and others added 25 commits December 10, 2021 06:03
* make some non_parallel unittest parallel execute

* delete duplicate ut
* fix static git diff check

* test=document_fix
* update logsumexp doc

* update api doc

* update api doc
* Debug

* Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation

* Rearranged interfaces

* Fixed ci issues
* add maxunpool2d in __all__

* fix MaxUnPool2D example
* add infrt code

refined with Paddle's code style.

* rename CinnRtConfig to InfRtConfig

* rename CinnRt to InfRt of some code

* rename CINNRT to INFRT

* remove unnecessary code

* replace CINN to INFRT in the source code

* replace all "cinn" in code to "infrt"

* remove some const_cast
* fix CUDA Graph H2D bug again

* fix no return bug
* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators
* refine a test case, test=develop

* publish python c api for eager, test=develop

* revert modify about test_allclose_layer.py, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* delete numpy includes, use pybind11 numpy.h, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* suport eager error msg, and add grad test case, test=develop

* refine, test=develop

* refine, test=develop

* generate eager core ops, only 4 ops, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop
…37821)

* Fix CUDAGraph bug for StreamSafeCUDAAllocator

* Add CUDAGrapthAllocator check in multi-stream interface

* Set FLAGS_use_stream_safe_cuda_allocator defaulted to false

* Fix environment error for cmake

* Fix cmake error

* Add UT of GetAllocatorInterfaceTest

* Add UT of CUDAGraphExceptionTest

* Enhance CUDAGraphExceptionTest
* add update func of auto search

* update unitest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inplace逻辑在上面的分支里也应该会存在吧,所以这里的代码能不能提取一下公共部分,两个分支的区别就是是否最后进行reduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.