[Inference] use fp8 cuda core gemm kernel when M<=4 #9423

zhink · 2024-11-14T06:28:11Z

PR types

New features

PR changes

Others

Description

To speed up fp8 gemm calculations when m<=4, you should export FLAGS_cuda_core_fp8_gemm=1 to use it.
speed data is :

m	k	n	L20(秒)	L20(秒)	加速(%)
1	4096	4096	5.43	3.85	29.05
1	4096	12800	5.45	3.86	29.21
1	6144	4096	5.43	3.89	28.34
1	2048	2048	5.43	3.91	28.13
1	2048	5504	5.55	3.93	29.30
1	6144	2048	5.45	3.90	28.43
1	5120	5120	5.45	3.90	28.56
1	5120	13824	5.50	3.97	27.74
1	15360	5120	5.56	3.93	29.28
2	4096	4096	5.51	3.97	27.97
2	4096	12800	5.52	3.94	28.54
2	6144	4096	5.50	3.95	28.13
2	2048	2048	5.51	3.93	28.57
2	2048	5504	5.58	3.95	29.20
2	6144	2048	5.49	3.95	28.13
2	5120	5120	5.50	3.95	28.19
2	5120	13824	5.49	3.94	28.26
2	15360	5120	5.59	4.04	27.62
3	4096	4096	5.55	3.96	28.59
3	4096	12800	5.56	3.96	28.73
3	6144	4096	5.47	3.93	28.15
3	2048	2048	5.49	3.95	28.19
3	2048	5504	5.58	3.95	29.22
3	6144	2048	5.53	3.93	28.84
3	5120	5120	5.46	3.93	28.04
3	5120	13824	5.43	4.31	20.58
3	15360	5120	5.53	5.25	4.99
4	4096	4096	5.48	3.89	29.05
4	4096	12800	5.49	4.03	26.65
4	6144	4096	5.47	3.89	28.74
4	2048	2048	5.46	3.89	28.69
4	2048	5504	5.53	3.90	29.40
4	6144	2048	5.47	3.91	28.45
4	5120	5120	5.48	3.91	28.67
4	5120	13824	5.45	5.22	4.12
4	15360	5120	5.64	6.38	-13.14

paddle-bot · 2024-11-14T06:28:15Z

Thanks for your contribution!

codecov · 2024-11-14T07:00:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.09%. Comparing base (85333aa) to head (945e92d).
Report is 56 commits behind head on develop.

❗ Current head 945e92d differs from pull request most recent head 2f5e5ea

Please upload reports for the commit 2f5e5ea to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9423      +/-   ##
===========================================
+ Coverage    52.95%   53.09%   +0.13%     
===========================================
  Files          682      685       +3     
  Lines       110667   108904    -1763     
===========================================
- Hits         58606    57824     -782     
+ Misses       52061    51080     -981

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

llm/docs/predict/best_practices.md

DrownFish19 · 2024-11-26T03:08:23Z

推荐把这个加速优化写到文档里；
在Flag说明里加上加速比和限制条件。

zhink force-pushed the develop branch 2 times, most recently from 5f40ee8 to 945e92d Compare November 22, 2024 03:25

DrownFish19 changed the title ~~use fp8 cuda core gemm kernel when M<=4~~ [Inference] use fp8 cuda core gemm kernel when M<=4 Nov 26, 2024

DrownFish19 reviewed Nov 26, 2024

View reviewed changes

llm/docs/predict/best_practices.md Outdated Show resolved Hide resolved

llm/docs/predict/best_practices.md Outdated Show resolved Hide resolved

use fp8 cuda core gemm kernel when M<=4

2f5e5ea

zhink force-pushed the develop branch from 945e92d to 2f5e5ea Compare November 26, 2024 03:39

DrownFish19 approved these changes Nov 26, 2024

View reviewed changes

DrownFish19 merged commit 0b4b810 into PaddlePaddle:develop Nov 26, 2024
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inference] use fp8 cuda core gemm kernel when M<=4 #9423

[Inference] use fp8 cuda core gemm kernel when M<=4 #9423

Uh oh!

zhink commented Nov 14, 2024

Uh oh!

paddle-bot bot commented Nov 14, 2024

Uh oh!

codecov bot commented Nov 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

DrownFish19 commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Inference] use fp8 cuda core gemm kernel when M<=4 #9423

[Inference] use fp8 cuda core gemm kernel when M<=4 #9423

Uh oh!

Conversation

zhink commented Nov 14, 2024

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Nov 14, 2024

Uh oh!

codecov bot commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

DrownFish19 commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 14, 2024 •

edited

Loading