Skip to content

Conversation

@b3602sss
Copy link
Contributor

@b3602sss b3602sss commented Oct 18, 2022

PR types

Performance optimization

PR changes

OPs

Describe

layernorm_shift_partition + element_add融合
在swin模型上有3%的加速效果

@paddle-bot
Copy link

paddle-bot bot commented Oct 18, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

"layernorm_shift_partition_fuse_pass", //
"merge_layernorm_fuse_pass", //
"preln_residual_bias_fuse_pass", //
"preln_layernorm_x_fuse_pass", //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个pass是不是也能放在原生gpu里面,如果把新增的plugin放到phi算子里,通过通用plugin应该也能接进来

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面 @weishengying 完善后,可以按照新方案接入plugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为目前只有trt算子 所以没有放到原生pass里

jiweibo
jiweibo previously approved these changes Oct 24, 2022
Copy link
Contributor

@jiweibo jiweibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for preln_layernorm pass

@b3602sss b3602sss changed the title Preln Preln_Layernorm_Shift_Partition Oct 26, 2022
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@b3602sss b3602sss merged commit d17d0cd into PaddlePaddle:develop Oct 26, 2022
@b3602sss b3602sss deleted the preln branch January 11, 2023 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants