|
| 1 | +## 🚣♂️ 使用 PaddleNLP 在 Intel HPU 下跑通 llama2-7b 模型 🚣 |
| 2 | +PaddleNLP 在 Intel® Gaudi®2D([了解 Gaudi](https://docs.habana.ai/en/latest/index.html))上对 llama2-7B 模型进行了深度适配和优化,下面给出详细安装步骤。 |
| 3 | + |
| 4 | +## 🚀 快速开始 🚀 |
| 5 | + |
| 6 | +### (0)在开始之前,您需要有一台 Intel Gaudi 机器,对此机器的系统要求如下: |
| 7 | + |
| 8 | + | 芯片类型 | 卡型号 | 驱动版本 | |
| 9 | + | --- | --- | --- | |
| 10 | + | Gaudi | 225D | 1.17.0 | |
| 11 | + |
| 12 | + |
| 13 | +### (1)环境准备:(这将花费您5~15min 时间) |
| 14 | +1. 拉取镜像 |
| 15 | +``` |
| 16 | +# 注意此镜像仅为开发环境,镜像中不包含预编译的飞桨安装包 |
| 17 | +docker pull vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest |
| 18 | +``` |
| 19 | +2. 参考如下命令启动容器 |
| 20 | +``` |
| 21 | +docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest |
| 22 | +``` |
| 23 | +3. 安装 paddle |
| 24 | +``` |
| 25 | +# paddlepaddle『飞桨』深度学习框架,提供运算基础能力 |
| 26 | +pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/linux/cpu-mkl/develop.html |
| 27 | +``` |
| 28 | +4. 安装 paddleCustomDevice |
| 29 | +``` |
| 30 | +# paddleCustomDevice是paddlepaddle『飞桨』深度学习框架的自定义硬件接入实现,提供Intel HPU的算子实现。 |
| 31 | +git clone --recursive https://github.com/PaddlePaddle/PaddleCustomDevice |
| 32 | +cd PaddleCustomDevice |
| 33 | +git submodule sync |
| 34 | +git submodule update --remote --init --recursive |
| 35 | +cd backends/intel_hpu/ |
| 36 | +mkdir build && cd build |
| 37 | +cmake .. |
| 38 | +make -j8 |
| 39 | +pip install dist/paddle_intel_hpu-0.0.1-cp310-cp310-linux_x86_64.whl |
| 40 | +``` |
| 41 | +5. 克隆 PaddleNLP 仓库代码,并安装依赖 |
| 42 | +``` |
| 43 | +# PaddleNLP是基于paddlepaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库,存放了基于『飞桨』框架实现的各种大模型,llama2-7B模型也包含其中。为了便于您更好地使用PaddleNLP,您需要clone整个仓库。 |
| 44 | +git clone https://github.com/PaddlePaddle/PaddleNLP.git |
| 45 | +cd PaddleNLP |
| 46 | +python -m pip install -r requirements.txt |
| 47 | +python -m pip install -e . |
| 48 | +``` |
| 49 | + |
| 50 | +### (2)推理:(这将花费您10~15min 时间) |
| 51 | +1. 单卡推理 |
| 52 | + |
| 53 | +执行如下命令进行推理: |
| 54 | +```bash |
| 55 | +python inference_hpu.py |
| 56 | +``` |
| 57 | + |
| 58 | +成功运行后,可以查看到推理结果的生成,样例如下: |
| 59 | +``` |
| 60 | +[2024-10-25 02:42:42,220] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load 'meta-llama/Llama-2-7b-chat'. |
| 61 | +[2024-10-25 02:42:42,427] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load 'meta-llama/Llama-2-7b-chat'. |
| 62 | +[2024-10-25 02:42:42,427] [ INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/config.json |
| 63 | +[2024-10-25 02:42:42,428] [ INFO] - Loading weights file from cache at /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/model_state.pdparams |
| 64 | +[2024-10-25 02:43:32,871] [ INFO] - Loaded weights file from disk, setting weights to model. |
| 65 | +[2024-10-25 02:44:15,226] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM. |
| 66 | +
|
| 67 | +[2024-10-25 02:44:15,226] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat. |
| 68 | +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| 69 | +[2024-10-25 02:44:15,229] [ INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/generation_config.json |
| 70 | +
|
| 71 | +['myself. I am a 35 year old woman from the United States. I am a writer and artist, and I have been living in Japan for the past 5 years. I am originally from the Midwest, but I have lived in several different places around the world, including California, New York, and now Japan.\nI am passionate about many things, including art, writing, music, and travel. I love to explore new places and cultures, and I am always looking for new inspiration for my art and writing. I am also a big fan of Japanese culture, and I try to learn as much'] |
| 72 | +``` |
| 73 | +2. 多卡推理 |
| 74 | + |
| 75 | +执行如下命令进行推理: |
| 76 | +```bash |
| 77 | +bash test_llama_2x.sh |
| 78 | +``` |
| 79 | +成功运行后,可以查看到推理结果的生成,样例如下: |
| 80 | +```bash |
| 81 | +[2024-10-29 11:24:39,468] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load 'meta-llama/Llama-2-7b-chat'. |
| 82 | +[2024-10-29 11:24:40,705] [ INFO] distributed_strategy.py:214 - distributed strategy initialized |
| 83 | +I1029 11:24:40.706755 14711 tcp_utils.cc:181] The server starts to listen on IP_ANY:59129 |
| 84 | +I1029 11:24:40.706897 14711 tcp_utils.cc:130] Successfully connected to 127.0.0.1:59129 |
| 85 | +[2024-10-29 11:24:42,740] [ INFO] topology.py:357 - Total 2 pipe comm group(s) create successfully! |
| 86 | +[2024-10-29 11:24:52,064] [ INFO] topology.py:357 - Total 2 data comm group(s) create successfully! |
| 87 | +[2024-10-29 11:24:52,064] [ INFO] topology.py:357 - Total 1 model comm group(s) create successfully! |
| 88 | +[2024-10-29 11:24:52,065] [ INFO] topology.py:357 - Total 2 sharding comm group(s) create successfully! |
| 89 | +[2024-10-29 11:24:52,065] [ INFO] topology.py:279 - HybridParallelInfo: rank_id: 0, mp_degree: 2, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1] |
| 90 | +[2024-10-29 11:24:52,067] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load 'meta-llama/Llama-2-7b-chat'. |
| 91 | +[2024-10-29 11:24:52,067] [ INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/config.json |
| 92 | +[2024-10-29 11:24:52,068] [ INFO] - Loading weights file from cache at /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/model_state.pdparams |
| 93 | +[2024-10-29 11:25:43,202] [ INFO] - Starting to convert orignal state_dict to tensor parallel state_dict. |
| 94 | +[2024-10-29 11:25:45,125] [ INFO] - Loaded weights file from disk, setting weights to model. |
| 95 | +[2024-10-29 11:26:04,008] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM. |
| 96 | +[2024-10-29 11:26:04,008] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat. |
| 97 | +If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. |
| 98 | +[2024-10-29 11:26:04,010] [ INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-7b-chat/generation_config.json |
| 99 | + |
| 100 | +['myself\nHello everyone my name is [Your Name], and I am a new member of this community'] |
| 101 | +I1029 11:26:16.184163 14767 tcp_store.cc:293] receive shutdown event and so quit from MasterDaemon run loop |
| 102 | +LAUNCH INFO 2024-10-29 11:26:17,186 Pod completed |
| 103 | +LAUNCH INFO 2024-10-29 11:26:17,186 Exit code 0 |
| 104 | +``` |
0 commit comments