-
Couldn't load subscription status.
- Fork 5.9k
implementation of broadcast div backward by reduce #37776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
✅ This PR's description meets the template requirements! |
|
Thanks for your contribution! |
b89d791 to
97e840c
Compare
97e840c to
1c656dc
Compare
1c656dc to
3087040
Compare
3087040 to
ab89c4d
Compare
ab89c4d to
a85db47
Compare
a85db47 to
2490a84
Compare
2490a84 to
665694c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释已删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float和double代码重复
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该函数目前已经删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,float和double重复
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase一下最新代码
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个头文件有添加的必要吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同sub,default_elementwise_div_grad和elementwise_div_grad cpu代码重复
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t_y表示什么意思?变量名最好能直观体现出含义
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
函数命名遵循大驼峰规则,不要全大写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.该函数已经修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释位置调整一下,太不显眼了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里求的是dx,函数名确实grady?不合适吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
该函数已经修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
变量名和注释位置同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
变量名
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除无用代码
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* multithread_memory_optimize
* make some non_parallel unittest parallel execute * delete duplicate ut
* fix static git diff check * test=document_fix
* update logsumexp doc * update api doc * update api doc
* Debug * Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation * Rearranged interfaces * Fixed ci issues
* add maxunpool2d in __all__ * fix MaxUnPool2D example
* add infrt code refined with Paddle's code style. * rename CinnRtConfig to InfRtConfig * rename CinnRt to InfRt of some code * rename CINNRT to INFRT * remove unnecessary code * replace CINN to INFRT in the source code * replace all "cinn" in code to "infrt" * remove some const_cast
* fix CUDA Graph H2D bug again * fix no return bug
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators
* refine a test case, test=develop * publish python c api for eager, test=develop * revert modify about test_allclose_layer.py, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * delete numpy includes, use pybind11 numpy.h, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * suport eager error msg, and add grad test case, test=develop * refine, test=develop * refine, test=develop * generate eager core ops, only 4 ops, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
…int, this convert is wrong (PaddlePaddle#37929)
…37821) * Fix CUDAGraph bug for StreamSafeCUDAAllocator * Add CUDAGrapthAllocator check in multi-stream interface * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false * Fix environment error for cmake * Fix cmake error * Add UT of GetAllocatorInterfaceTest * Add UT of CUDAGraphExceptionTest * Enhance CUDAGraphExceptionTest
* add boardcast_sub * add boardcast_sub
* add update func of auto search * update unitest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inplace逻辑在上面的分支里也应该会存在吧,所以这里的代码能不能提取一下公共部分,两个分支的区别就是是否最后进行reduce
665694c to
d621924
Compare
PR types
Performance optimization
PR changes
OPs
Describe
使用reduce实现broadcast div 反向,相比于原始性能数据如下:
其中case1、2、3相比于原始dev分支优化的比例有所下降,但是其他配置均有很大提升