Skip to content

Conversation

@Avin0323
Copy link
Contributor

@Avin0323 Avin0323 commented Apr 27, 2021

PR types

Others

PR changes

Others

Describe

  • 复用Eigen调用:多个OP使用相同的eigen API,每次编译都会重新编译一次,造成冗余;
  • 将CUDA Kernel注册在cc文件中完成:nvcc编译耗时较久,将不生成CUDA代码的Kernel注册在cc文件中完成,降低编译耗时;
*.o 编译耗时变化(“-”表示编译时间无变化 编译耗时变化比例(“-”表示编译时间无变化
erf_op.cc.o - -
erf_op.cu.o -28s -100%
rank_loss_op.cc.o - -
rank_loss_op.cu.o -28s -100%
im2sequence_op.cc.o - -
im2sequence_op.cu.o -28s -100%
l1_norm_op.cc.o - -
l1_norm_op.cu.o -27s -100%
scale_op.cc.o -1s -5%
scale_op.cu.o -31s -100%
slice_op.cc.o -12s -11%
slice_op.cu.o -129s -100%
increment_op.cc.o - -
increment_op.cu.o -28s -100%
conv_transpose_cudnn_op.cu.o -1s -3%
conv_fusion_op.cu.o -1s -3%
pad_constant_like_op.cc.o -5s -25%
pad_constant_like_op.cu.o -34s -100%
pad_op.cc.o -5s -20%
pad_op.cu.o -37s -100%
conv_cudnn_op.cu.o -2s -5%
reverse_op.cc.o -11s -20%
reverse_op.cu.o -39s -100%
crop_tensor_op.cc.o -14s -20%
crop_tensor_op.cu.o -45s -100%
conv_transpose_op.cc.o -4s -5%
conv_transpose_op.cu.o -2s -5%
sign_op.cc.o - -
sign_op.cu.o -27s -100%
crop_op.cc.o -7s -20%
crop_op.cu.o -35s -100%
hinge_loss_op.cc.o - -
hinge_loss_op.cu.o -28s -100%
minus_op.cc.o - -
minus_op.cu.o -28s -100%
imperative.cc.o -9s -5%
pybind.cc.o -60s -30%
  • CUDA Kernel注册在cc文件中完成以及复用Eigen调用后,约减少37个*.o编译时间,总计减少编译时间706s;
  • 并行编译场景下预计减少2-3min编译耗时;
    • 第一次编译下,预计减少2-5%编译耗时;
    • CI编译场景下(ccache二次编译),预计减少5-10%编译耗时;

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Avin0323 Avin0323 marked this pull request as draft April 29, 2021 04:38
@Avin0323 Avin0323 marked this pull request as ready for review April 29, 2021 04:38
@Avin0323 Avin0323 marked this pull request as draft April 29, 2021 05:34
@Avin0323 Avin0323 marked this pull request as ready for review April 29, 2021 05:34
@Avin0323 Avin0323 marked this pull request as draft April 29, 2021 07:56
@Avin0323 Avin0323 marked this pull request as ready for review April 29, 2021 07:56
wanghuancoder
wanghuancoder previously approved these changes May 12, 2021
Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Avin0323 Avin0323 changed the title [WIP]optimize OP's compilation time optimize OP's compilation time May 12, 2021
luotao1
luotao1 previously approved these changes May 12, 2021
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@paddle-bot-old
Copy link

Sorry to inform you that db7898e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@Avin0323 Avin0323 dismissed stale reviews from luotao1 and wanghuancoder via 4afb74e May 25, 2021 05:01
Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 78ecb66 into PaddlePaddle:develop May 26, 2021
@Avin0323 Avin0323 deleted the eigen-function branch May 28, 2021 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants