- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.9k
Dropout optimize & clean broadcast inT and ElementwiseType #52969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
            raindrops2sea
  merged 24 commits into
  PaddlePaddle:develop
from
zhangboSJTU:dropout_opt_clean_BcInT
  
      
      
   
  Apr 28, 2023 
      
    
                
     Merged
            
            Dropout optimize & clean broadcast inT and ElementwiseType #52969
                    raindrops2sea
  merged 24 commits into
  PaddlePaddle:develop
from
zhangboSJTU:dropout_opt_clean_BcInT
  
      
      
   
  Apr 28, 2023 
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    | 你的PR提交成功,感谢你对开源项目的贡献! | 
| 最好补充一下A100相关性能数据 | 
| 
 Done | 
              
                    shaojiewang
  
              
              previously approved these changes
              
                  
                    Apr 27, 2023 
                  
              
              
            
            
b730437    to
    15d2e3c      
    Compare
  
    df56d7e    to
    6bdbf1c      
    Compare
  
    ed40af6    to
    751ce2a      
    Compare
  
    
              
                    shaojiewang
  
              
              approved these changes
              
                  
                    Apr 28, 2023 
                  
              
              
            
            
              
                    raindrops2sea
  
              
              approved these changes
              
                  
                    Apr 28, 2023 
                  
              
              
            
            
              
                    ZzSean
  
              
              approved these changes
              
                  
                    Apr 28, 2023 
                  
              
              
            
            
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for CI-OP-Benchmark
    
  zhangboSJTU 
      added a commit
        to zhangboSJTU/Paddle
      that referenced
      this pull request
    
      May 9, 2023 
    
    
      
  
    
      
    
  
…dle#52969) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * clean ElementwiseT and InT for BroadcastKernel * default axis and clean inT * remove redundant fast divmod computation * optimize drop_nd & drop_nd_grad * optimize BroadcastDataLoader bf16 fp16 * rm InT etc. after merge develop * delete constexpr for windows ci * fix conflict * fix conflic with develop * fix conflic * new clean * clean
    
  XiaoguangHu01 
      pushed a commit
      that referenced
      this pull request
    
      May 10, 2023 
    
    
      
  
    
      
    
  
…to Release/2.5 (#53623) * Support different dtypes of inputs for broadcast for dropout optimization (#52093) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * PR comment * dropout_nd_optimization (#51479) * with printf * add DropOutNdForwardKernel * PR comment * Dropout optimize & clean broadcast inT and ElementwiseType (#52969) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * clean ElementwiseT and InT for BroadcastKernel * default axis and clean inT * remove redundant fast divmod computation * optimize drop_nd & drop_nd_grad * optimize BroadcastDataLoader bf16 fp16 * rm InT etc. after merge develop * delete constexpr for windows ci * fix conflict * fix conflic with develop * fix conflic * new clean * clean * Fix xpu2 kp compile error (#53548) * fix conflict * conflict
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
PR types
Performance optimization
PR changes
OPs
Description
This PR is the follow-up part of #52093
ElementwiseTypeInTand make default axis = -1 in functionBroadcastKernelBroadcastDataLoaderand here are the resultdropout,drop_nd,drop_nd_gradand here are the result ofdrop_nd_gradTest broadcast performance with test_ternary_broadcast.cu on V100 16G cuda11.2 unit(ms)
A100 40G res
Test dropout_nd_grad performance on V100 16G cuda11.2 unit(ms)
Configs from https://github.com/PaddlePaddle/benchmark/pull/1673/files