Skip to content

Commit eddc061

Browse files
EnflameGCUlixcli
authored andcommitted
[GCU] Add gcu llama2-13b readme (PaddlePaddle#8950)
1 parent ee4e99c commit eddc061

File tree

2 files changed

+204
-0
lines changed

2 files changed

+204
-0
lines changed

llm/gcu/llama/README.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
## 🚣‍♂️ 使用PaddleNLP在燧原S60下运行llama2-13b模型 🚣
2+
3+
燧原S60([了解燧原](https://www.enflame-tech.com/))是面向数据中心大规模部署的新一代人工智能推理加速卡,满足大语言模型、搜广推及传统模型的需求,具有模型覆盖面广、易用性强、易迁移易部署等特点,可广泛应用于图像及文本生成等应用、搜索与推荐、文本、图像及语音识别等主流推理场景。
4+
5+
PaddleNLP在燧原S60上对llama2-13B模型进行了深度适配和优化,实现了GCU推理入口和GPU的基本统一,仅需修改device即可完成推理任务的迁移。
6+
7+
## 🚀 快速开始 🚀
8+
9+
### 0. 机器准备。快速开始之前,您需要准备一台插有燧原S60加速卡的机器,要求如下:
10+
11+
| 芯片类型 | 驱动版本 | TopsPlatform版本 |
12+
| :---: | :---: | :---: |
13+
| 燧原S60 | 1.0.5.1 | TopsPlatform_1.0.5.1-2c3111 |
14+
15+
**注:如果需要验证您的机器是否插有燧原S60加速卡,只需系统环境下输入以下命令,看是否有输出:**
16+
```bash
17+
lspci | grep S60
18+
19+
# 例如:lspci | grep S60 , 输出如下
20+
01:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
21+
09:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
22+
```
23+
### 1. 环境准备:(这将花费您10~20min时间)
24+
25+
1. 初始化环境,安装驱动<br/>
26+
**注:您可以联系燧原生态团队(Email: [email protected])以获取软件驱动包和其他帮助**
27+
```bash
28+
# 假设安装包位于:/home/paddle_user/deps/, 名称为:TopsPlatform.tar.gz
29+
cd /home/paddle_user/deps/ && tar -zxf TopsPlatform.tar.gz
30+
cd TopsPlatform
31+
./TopsPlatform_1.0.5.1-2c3111_deb_amd64.run --no-auto-load --driver -y
32+
```
33+
2. 拉取镜像
34+
```bash
35+
# 注意此镜像仅为paddle开发环境,镜像中不包含预编译的飞桨安装包、TopsPlatform安装包等
36+
docker pull registry.baidubce.com/paddlepaddle/paddle:latest-dev
37+
```
38+
3. 参考如下命令启动容器
39+
```bash
40+
docker run --name paddle-gcu-test -v /home:/home --network=host --ipc=host -it --privileged registry.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
41+
```
42+
4. 安装编译套件
43+
```bash
44+
# 安装cmake用于源码编译
45+
cd /root
46+
wget https://github.com/Kitware/CMake/releases/download/v3.23.4/cmake-3.23.4-linux-x86_64.tar.gz
47+
tar -zxf ./cmake-3.23.4-linux-x86_64.tar.gz
48+
ln -sf /root/cmake-3.23.4-linux-x86_64/bin/cmake /usr/bin/cmake && ln -sf /root/cmake-3.23.4-linux-x86_64/bin/ctest /usr/bin/ctest
49+
```
50+
5. 安装燧原软件栈
51+
```bash
52+
# 在paddle docker里安装燧原软件栈,编译执行会依赖sdk、runtime、eccl、aten、topstx(for profiler)
53+
cd /home/paddle_user/deps/TopsPlatform
54+
./TopsPlatform_1.0.5.1-2c3111_deb_amd64.run --no-auto-load -y
55+
dpkg -i topsfactor_*.deb tops-sdk_*.deb eccl_*.deb topsaten_*.deb
56+
```
57+
6. 安装PaddlePaddle
58+
```bash
59+
# PaddlePaddle『飞桨』深度学习框架,提供运算基础能力
60+
python -m pip install paddlepaddle==3.0.0b0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
61+
```
62+
7. 编译安装PaddleCustomDevice<br/>
63+
PaddleCustomDevice是PaddlePaddle『飞桨』深度学习框架的自定义硬件接入实现,提供GCU的设备管理及算子实现。<br/>
64+
**注:当前仍需源码编译PaddleCustomDevice,paddle-custom-gcu预编译版本待发布**
65+
```bash
66+
# 下载源码
67+
mkdir -p /home/paddle_user/workspace && cd /home/paddle_user/workspace
68+
git clone https://github.com/PaddlePaddle/PaddleCustomDevice.git
69+
cd PaddleCustomDevice
70+
# 切换到v3.0.0-beta1版本
71+
git checkout -b v3.0-beta v3.0.0-beta1
72+
# 依赖的算子库
73+
cp /home/paddle_user/deps/TopsPlatform/libtopsop.a ./backends/gcu/kernels/topsflame/
74+
# 开始编译,依赖的第三方库会在首次编译时按需下载。从github下载可能会比较慢
75+
cd backends/gcu/ && mkdir -p build && cd build
76+
export PADDLE_CUSTOM_PATH=`python -c "import re, paddle; print(re.compile('/__init__.py.*').sub('',paddle.__file__))"`
77+
cmake .. -DWITH_TESTING=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DPY_VERSION=3.9
78+
make -j64
79+
# 编译产物在build/dist,使用pip安装
80+
python -m pip install --force-reinstall -U dist/paddle_custom_gcu*.whl
81+
```
82+
8. 下载PaddleNLP仓库代码,并安装依赖
83+
```bash
84+
# PaddleNLP是基于PaddlePaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库,存放了基于『飞桨』框架实现的各种大模型,llama2-13B模型也包含其中。为了便于您更好地使用PaddleNLP,您需要clone整个仓库。
85+
cd /home/paddle_user/workspace
86+
git clone https://github.com/PaddlePaddle/PaddleNLP.git
87+
cd PaddleNLP
88+
# 切换到v3.0.0-beta0版本
89+
git checkout -b v3.0-beta v3.0.0-beta0
90+
# 安装依赖库
91+
python -m pip install -r requirements.txt
92+
# 源码编译安装 paddlenlp v3.0.0-beta0
93+
python setup.py bdist_wheel && python -m pip uninstall paddlenlp -y && python -m pip install dist/paddlenlp*
94+
```
95+
### 2. 数据准备:(这将花费您2~5min时间)
96+
使用训练好的模型,在wikitext-103上评估
97+
```bash
98+
cd llm/gcu/llama
99+
wget https://paddlenlp.bj.bcebos.com/data/benchmark/wikitext-103.tar.gz
100+
tar -zxf wikitext-103.tar.gz
101+
```
102+
### 3. 推理:(这将花费您15~30min时间)
103+
执行如下命令进行推理:
104+
```bash
105+
bash predict_llama_gcu.sh
106+
```
107+
首次推理将自动下载权重和配置,位于```/root/.paddlenlp/models/__internal_testing__/sci-benchmark-llama-13b-5k/```目录下。<br/>
108+
**推荐在首次下载权重文件后更改推理配置文件,以获取更大的性能提升。**<br/>
109+
```/root/.paddlenlp/models/__internal_testing__/sci-benchmark-llama-13b-5k/config.json```更改为下面的内容:
110+
```json
111+
{
112+
"alibi": false,
113+
"architectures": [
114+
"LlamaForCausalLM"
115+
],
116+
"attention_probs_dropout_prob": 0.1,
117+
"bos_token_id": 1,
118+
"dtype": "float16",
119+
"eos_token_id": 2,
120+
"hidden_dropout_prob": 0.1,
121+
"hidden_size": 5120,
122+
"initializer_range": 0.002,
123+
"intermediate_size": 13824,
124+
"max_position_embeddings": 2048,
125+
"model_type": "llama",
126+
"num_attention_heads": 40,
127+
"num_hidden_layers": 40,
128+
"num_key_value_heads": 40,
129+
"pad_token_id": 0,
130+
"paddlenlp_version": null,
131+
"rms_norm_eps": 1e-06,
132+
"rope_scaling_factor": 1.0,
133+
"rope_scaling_type": null,
134+
"tie_word_embeddings": false,
135+
"use_recompute": false,
136+
"virtual_pp_degree": 1,
137+
"vocab_size": 32000,
138+
"use_fused_rope": true,
139+
"use_fused_rms_norm": true,
140+
"use_flash_attention": true,
141+
"fuse_attention_qkv": true,
142+
"fuse_attention_ffn": true
143+
}
144+
```
145+
成功运行后,可以查看到推理结果的困惑度指标(ppl),最终评估结果ppl: 12.785。
146+
```bash
147+
[2024-08-16 01:55:24,753] [ INFO] - step 2000, batch: 2000, loss: 2.323283, speed: 1.40 step/s
148+
[2024-08-16 01:55:31,813] [ INFO] - step 2010, batch: 2010, loss: 2.341318, speed: 1.42 step/s
149+
[2024-08-16 01:55:38,859] [ INFO] - step 2020, batch: 2020, loss: 2.357684, speed: 1.42 step/s
150+
[2024-08-16 01:55:45,897] [ INFO] - step 2030, batch: 2030, loss: 2.371745, speed: 1.42 step/s
151+
[2024-08-16 01:55:52,942] [ INFO] - step 2040, batch: 2040, loss: 2.386801, speed: 1.42 step/s
152+
[2024-08-16 01:55:59,991] [ INFO] - step 2050, batch: 2050, loss: 2.399686, speed: 1.42 step/s
153+
[2024-08-16 01:56:07,037] [ INFO] - step 2060, batch: 2060, loss: 2.410638, speed: 1.42 step/s
154+
[2024-08-16 01:56:14,080] [ INFO] - step 2070, batch: 2070, loss: 2.421459, speed: 1.42 step/s
155+
[2024-08-16 01:56:21,141] [ INFO] - step 2080, batch: 2080, loss: 2.431433, speed: 1.42 step/s
156+
[2024-08-16 01:56:28,170] [ INFO] - step 2090, batch: 2090, loss: 2.443705, speed: 1.42 step/s
157+
[2024-08-16 01:56:35,238] [ INFO] - step 2100, batch: 2100, loss: 2.454847, speed: 1.41 step/s
158+
[2024-08-16 01:56:42,275] [ INFO] - step 2110, batch: 2110, loss: 2.464446, speed: 1.42 step/s
159+
[2024-08-16 01:56:49,323] [ INFO] - step 2120, batch: 2120, loss: 2.475107, speed: 1.42 step/s
160+
[2024-08-16 01:56:56,348] [ INFO] - step 2130, batch: 2130, loss: 2.487760, speed: 1.42 step/s
161+
[2024-08-16 01:57:03,372] [ INFO] - step 2140, batch: 2140, loss: 2.501706, speed: 1.42 step/s
162+
[2024-08-16 01:57:10,395] [ INFO] - step 2150, batch: 2150, loss: 2.513665, speed: 1.42 step/s
163+
[2024-08-16 01:57:17,411] [ INFO] - step 2160, batch: 2160, loss: 2.524555, speed: 1.43 step/s
164+
[2024-08-16 01:57:24,437] [ INFO] - step 2170, batch: 2170, loss: 2.536793, speed: 1.42 step/s
165+
[2024-08-16 01:57:31,461] [ INFO] - step 2180, batch: 2180, loss: 2.547897, speed: 1.42 step/s
166+
[2024-08-16 01:57:34,378] [ INFO] - validation results on ./wikitext-103/wiki.valid.tokens | avg loss: 2.5483E+00 | ppl: 1.2785E+01 | adjusted ppl: 2.6434E+01 | token ratio: 1.285056584007609 |
167+
'Original Tokens: 279682, Detokenized tokens: 217642'
168+
'Original Tokens: 279682, Detokenized tokens: 217642'
169+
I0816 01:57:34.386860 10925 runtime.cc:130] Backend GCU finalize device:0
170+
I0816 01:57:34.386868 10925 runtime.cc:98] Backend GCU Finalize
171+
```

llm/gcu/llama/predict_llama_gcu.sh

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
export PADDLE_RUN_ASYNC=true
16+
export FLAGS_use_stride_kernel=false
17+
export FLAGS_auto_growth_chunk_size_in_mb=512
18+
export FLAGS_use_stream_safe_cuda_allocator=false
19+
export CUSTOM_DEVICE_BLACK_LIST="softmax_with_cross_entropy"
20+
21+
export PYTHONPATH=../../:$PYTHONPATH
22+
23+
echo 'run llama wiki_text eval, log: wikitext_eval_gcu.log'
24+
python ../../../legacy/examples/benchmark/wiki_lambada/eval.py \
25+
--model_name_or_path "__internal_testing__/sci-benchmark-llama-13b-5k" \
26+
--device gcu \
27+
--batch_size 4 \
28+
--eval_path ./wikitext-103/wiki.valid.tokens \
29+
--tensor_parallel_degree 1 \
30+
--logging_steps 10 \
31+
--use_flash_attention True \
32+
--dtype float16 &> wikitext_eval_gcu.log &
33+

0 commit comments

Comments
 (0)