Skip to content

Conversation

@zhink
Copy link
Contributor

@zhink zhink commented Nov 14, 2024

PR types

New features

PR changes

Others

Description

To speed up fp8 gemm calculations when m<=4, you should export FLAGS_cuda_core_fp8_gemm=1 to use it.
speed data is :

m k n L20(秒) L20(秒) 加速(%)
1 4096 4096 5.43 3.85 29.05
1 4096 12800 5.45 3.86 29.21
1 6144 4096 5.43 3.89 28.34
1 2048 2048 5.43 3.91 28.13
1 2048 5504 5.55 3.93 29.30
1 6144 2048 5.45 3.90 28.43
1 5120 5120 5.45 3.90 28.56
1 5120 13824 5.50 3.97 27.74
1 15360 5120 5.56 3.93 29.28
2 4096 4096 5.51 3.97 27.97
2 4096 12800 5.52 3.94 28.54
2 6144 4096 5.50 3.95 28.13
2 2048 2048 5.51 3.93 28.57
2 2048 5504 5.58 3.95 29.20
2 6144 2048 5.49 3.95 28.13
2 5120 5120 5.50 3.95 28.19
2 5120 13824 5.49 3.94 28.26
2 15360 5120 5.59 4.04 27.62
3 4096 4096 5.55 3.96 28.59
3 4096 12800 5.56 3.96 28.73
3 6144 4096 5.47 3.93 28.15
3 2048 2048 5.49 3.95 28.19
3 2048 5504 5.58 3.95 29.22
3 6144 2048 5.53 3.93 28.84
3 5120 5120 5.46 3.93 28.04
3 5120 13824 5.43 4.31 20.58
3 15360 5120 5.53 5.25 4.99
4 4096 4096 5.48 3.89 29.05
4 4096 12800 5.49 4.03 26.65
4 6144 4096 5.47 3.89 28.74
4 2048 2048 5.46 3.89 28.69
4 2048 5504 5.53 3.90 29.40
4 6144 2048 5.47 3.91 28.45
4 5120 5120 5.48 3.91 28.67
4 5120 13824 5.45 5.22 4.12
4 15360 5120 5.64 6.38 -13.14



@paddle-bot
Copy link

paddle-bot bot commented Nov 14, 2024

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.09%. Comparing base (85333aa) to head (945e92d).
Report is 56 commits behind head on develop.

Current head 945e92d differs from pull request most recent head 2f5e5ea

Please upload reports for the commit 2f5e5ea to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9423      +/-   ##
===========================================
+ Coverage    52.95%   53.09%   +0.13%     
===========================================
  Files          682      685       +3     
  Lines       110667   108904    -1763     
===========================================
- Hits         58606    57824     -782     
+ Misses       52061    51080     -981     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhink zhink force-pushed the develop branch 2 times, most recently from 5f40ee8 to 945e92d Compare November 22, 2024 03:25
@DrownFish19 DrownFish19 changed the title use fp8 cuda core gemm kernel when M<=4 [Inference] use fp8 cuda core gemm kernel when M<=4 Nov 26, 2024
@DrownFish19
Copy link
Collaborator

  1. 推荐把这个加速优化写到文档里;
  2. 在Flag说明里加上加速比和限制条件。

@DrownFish19 DrownFish19 merged commit 0b4b810 into PaddlePaddle:develop Nov 26, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants