Skip to content

Commit 3675ea2

Browse files
authored
[Inference] FP8 gemm auto-tune (#9094)
* fp8 cutlass gemm tune * git ignore third_party * check csrc/readme.md
1 parent 73a3db9 commit 3675ea2

23 files changed

+1850
-1033
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,6 @@ FETCH_HEAD
126126
./ppdiffusers/ppdiffusers/version.py
127127

128128
# third party
129-
csrc/gpu/cutlass_kernels/cutlass
129+
csrc/third_party/
130130
dataset/
131131
output/

csrc/README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ pip install -r requirements.txt
1010

1111
## 编译 Cuda 算子
1212

13+
生成 FP8的 cutlass 算子(编译耗时较长)
14+
```shell
15+
python generate_code_gemm_fused_kernels.py
16+
```
17+
18+
编译
1319
```shell
1420
python setup_cuda.py install
1521
```
@@ -20,9 +26,14 @@ python setup_cuda.py install
2026
2. 拉取代码:
2127
git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
2228

23-
3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`
29+
3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`
2430

2531
4. 重新编译 Cuda 算子
2632
```shell
2733
python setup_cuda.py install
2834
```
35+
36+
### FP8 GEMM 自动调优
37+
```shell
38+
sh tune_fp8_gemm.sh
39+
```

0 commit comments

Comments
 (0)