Skip to content

Commit 67a1cbc

Browse files
committed
fp8 cutlass gemm tune
1 parent d245d07 commit 67a1cbc

22 files changed

+1849
-1032
lines changed

csrc/README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ pip install -r requirements.txt
1010

1111
## 编译 Cuda 算子
1212

13+
生成 FP8的 cutlass 算子(编译耗时较长)
14+
```shell
15+
python generate_code_gemm_fused_kernels.py
16+
```
17+
18+
编译
1319
```shell
1420
python setup_cuda.py install
1521
```
@@ -20,9 +26,14 @@ python setup_cuda.py install
2026
2. 拉取代码:
2127
git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
2228

23-
3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`
29+
3. 将下载的 `cutlass` 目录放在 `third_party/cutlass`
2430

2531
4. 重新编译 Cuda 算子
2632
```shell
2733
python setup_cuda.py install
2834
```
35+
36+
### FP8 GEMM 自动调优
37+
```shell
38+
sh tune_fp8_gemm.sh
39+
```

0 commit comments

Comments
 (0)