Skip to content

Conversation

@gzy19990617
Copy link
Contributor

@gzy19990617 gzy19990617 commented Aug 9, 2024

PR types

New features

PR changes

Add new cutlass op

Description

Add cutlass gemm dequant op

  1. 精度测试
    参数:--decode_strategy greedy_search --mode dynamic --quant_type a8w8 --inference_model 1 --batch_size 2 --src_length 128 --max_length 256
    Use block atta输出:
image

Not use block atta输出:(不增加该PR时,第二条输出就有乱码)
image

  1. 性能测试:平均耗时44.9ms -> 42.6ms
    测试配置 L20 、batch_size 2、block atta
    gemm dequant 未融合:
Pasted Graphic 3

gemm dequant 融合:
Pasted Graphic 4

3.尝试qkv_out后接dequant,但出现报错
详细见这里:
https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/TK3hw_mluo/1-4J_hgwU8mmJN

@paddle-bot
Copy link

paddle-bot bot commented Aug 9, 2024

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Project coverage is 53.76%. Comparing base (a18e220) to head (aa0fdd0).
Report is 216 commits behind head on develop.

Files with missing lines Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8909      +/-   ##
===========================================
+ Coverage    53.58%   53.76%   +0.18%     
===========================================
  Files          652      652              
  Lines       105169   104513     -656     
===========================================
- Hits         56354    56193     -161     
+ Misses       48815    48320     -495     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DrownFish19 DrownFish19 changed the title Add cutlass gemm dequant op [Inference] Add cutlass gemm dequant op Aug 29, 2024
@wawltor wawltor merged commit c28caf7 into PaddlePaddle:develop Aug 29, 2024
Mangodadada pushed a commit to Mangodadada/PaddleNLP that referenced this pull request Sep 10, 2024
* change gpu name

* add cutlass gemm_dequant op

* add cutlass gemm_dequant op

* fix format

* fix fused_transformer_layers

* fix layer

* fix layer

* fix format

* fix format

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants