PaddlePaddle
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 1 deletion b/‎.gitignore‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎csrc/README.md‎
Lines changed: 12 additions & 1 deletion b/‎csrc/README.md‎
Lines changed: 12 additions & 1 deletion
@@ -126,6 +126,6 @@ FETCH_HEAD
 ./ppdiffusers/ppdiffusers/version.py
 
 # third party
-csrc/gpu/cutlass_kernels/cutlass
+csrc/third_party/
 dataset/
 output/
@@ -10,6 +10,12 @@ pip install -r requirements.txt
 
 ## 编译 Cuda 算子
 
+生成 FP8的 cutlass 算子(编译耗时较长)
+```shell
+python generate_code_gemm_fused_kernels.py
+```
+
+编译
 ```shell
 python setup_cuda.py install
 ```
@@ -20,9 +26,14 @@ python setup_cuda.py install
 2. 拉取代码:
     git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
 
-3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`下
+3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`下
 
 4. 重新编译 Cuda 算子
 ```shell
 python setup_cuda.py install
 ```
+
+### FP8 GEMM 自动调优
+```shell
+sh tune_fp8_gemm.sh
+```