Skip to content

Commit ce18df0

Browse files
authored
[FastDeploy] Add bert fastdeploy example (#5003)
* Add infer script * Add Bert deploy readme * Update runtime option * Update readme * paddle->paddle_infer * Update doc * update doc * update doc and remove predict scripts * remove glue
1 parent cfc5a47 commit ce18df0

File tree

6 files changed

+316
-368
lines changed

6 files changed

+316
-368
lines changed

model_zoo/bert/README.md

Lines changed: 12 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -268,66 +268,25 @@ python -u ./export_model.py \
268268
- `model_path` 表示训练模型的保存路径,与训练时的`output_dir`一致。
269269
- `output_path` 表示导出预测模型文件的前缀。保存时会添加后缀(`pdiparams``pdiparams.info``pdmodel`);除此之外,还会在`output_path`包含的目录下保存tokenizer相关内容。
270270

271-
然后按照如下的方式进行GLUE中的评测任务进行预测(基于Paddle的[Python预测API](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/python_infer_cn.html)
271+
完成模型导出后,可以开始部署。`deploy/python/seq_cls_infer.py` 文件提供了python部署预测示例。可执行以下命令运行部署示例
272272

273273
```shell
274-
python -u ./predict_glue.py \
275-
--task_name SST2 \
276-
--model_type bert \
277-
--model_path ./infer_model/model \
278-
--batch_size 32 \
279-
--max_seq_length 128
274+
python deploy/python/seq_cls_infer.py --model_dir infer_model/ --device gpu --backend paddle
280275
```
281276

282-
其中参数释义如下:
283-
- `task_name` 表示Fine-tuning的任务。
284-
- `model_type` 指示了模型类型,使用BERT模型时设置为bert即可。
285-
- `model_path` 表示预测模型文件的前缀,和上一步导出预测模型中的`output_path`一致。
286-
- `batch_size` 表示每个预测批次的样本数目。
287-
- `max_seq_length` 表示最大句子长度,超过该长度将被截断。
288-
289-
同时支持使用输入样例数据的方式进行预测任务,这里仅以文本情感分类数据[SST-2](https://nlp.stanford.edu/sentiment/index.html)为例,输出样例数据的分类预测结果:
277+
运行后预测结果打印如下:
290278

291-
```shell
292-
python -u ./predict.py \
293-
--model_path ./infer_model/model \
294-
--device gpu \
295-
--max_seq_length 128
279+
```bash
280+
[2023-03-02 08:30:03,877] [ INFO] - We are using <class 'paddlenlp.transformers.bert.fast_tokenizer.BertFastTokenizer'> to load '../../infer_model/'.
281+
[INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
282+
Batch id: 0, example id: 0, sentence1: against shimmering cinematography that lends the setting the ethereal beauty of an asian landscape painting, label: positive, negative prob: 0.0003, positive prob: 0.9997.
283+
Batch id: 1, example id: 0, sentence1: the situation in a well-balanced fashion, label: positive, negative prob: 0.0002, positive prob: 0.9998.
284+
Batch id: 2, example id: 0, sentence1: at achieving the modest , crowd-pleasing goals it sets for itself, label: positive, negative prob: 0.0017, positive prob: 0.9983.
285+
Batch id: 3, example id: 0, sentence1: so pat it makes your teeth hurt, label: negative, negative prob: 0.9986, positive prob: 0.0014.
286+
Batch id: 4, example id: 0, sentence1: this new jangle of noise , mayhem and stupidity must be a serious contender for the title ., label: negative, negative prob: 0.9806, positive prob: 0.0194.
296287
```
297288

298-
其中参数释义如下:
299-
- `model_path` 表示预测模型文件的前缀,和上一步导出预测模型中的`output_path`一致。
300-
- `device` 表示训练使用的设备, 'gpu'表示使用GPU, 'xpu'表示使用百度昆仑卡, 'cpu'表示使用CPU。
301-
- `max_seq_length` 表示最大句子长度,超过该长度将被截断。
302-
303-
样例中的待预测数据返回输出的预测结果如下:
304-
305-
```text
306-
Data: against shimmering cinematography that lends the setting the ethereal beauty of an asian landscape painting
307-
Label: positive
308-
Negative prob: 0.0004963805549778044
309-
Positive prob: 0.9995037317276001
310-
311-
Data: the situation in a well-balanced fashion
312-
Label: positive
313-
Negative prob: 0.000471479695988819
314-
Positive prob: 0.9995285272598267
315-
316-
Data: at achieving the modest , crowd-pleasing goals it sets for itself
317-
Label: positive
318-
Negative prob: 0.0019163173856213689
319-
Positive prob: 0.998083770275116
320-
321-
Data: so pat it makes your teeth hurt
322-
Label: negative
323-
Negative prob: 0.9988648295402527
324-
Positive prob: 0.0011351780267432332
325-
326-
Data: this new jangle of noise , mayhem and stupidity must be a serious contender for the title .
327-
Label: negative
328-
Negative prob: 0.9884825348854065
329-
Positive prob: 0.011517543345689774
330-
```
289+
更多详细用法可参考 [Python 部署](deploy/python/README.md)
331290

332291
## 扩展
333292

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# FastDeploy BERT 模型 Python 部署示例
2+
3+
在部署前,参考 [FastDeploy SDK 安装文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/build_and_install/download_prebuilt_libraries.md)安装 FastDeploy Python SDK。
4+
5+
本目录下分别提供 `seq_cls_infer.py` 快速完成在 CPU/GPU 的 GLUE 文本分类任务的 Python 部署示例。
6+
7+
## 依赖安装
8+
9+
直接执行以下命令安装部署示例的依赖。
10+
11+
```bash
12+
# 安装 fast_tokenizer 以及 GPU 版本 fastdeploy
13+
pip install fast-tokenizer-python fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
14+
```
15+
16+
## 快速开始
17+
18+
以下示例展示如何基于 FastDeploy 库完成 BERT 模型在 GLUE SST-2 数据集上进行自然语言推断任务的 Python 预测部署,可通过命令行参数`--device`以及`--backend`指定运行在不同的硬件以及推理引擎后端,并使用`--model_dir`参数指定运行的模型,具体参数设置可查看下面[参数说明](#参数说明)。示例中的模型是按照 [BERT 训练文档](../../README.md)导出得到的部署模型,其模型目录为`model_zoo/bert/infer_model`(用户可按实际情况设置)。
19+
20+
21+
```bash
22+
# CPU 推理
23+
python seq_cls_infer.py --model_dir ../../infer_model/ --device cpu --backend paddle
24+
# GPU 推理
25+
python seq_cls_infer.py --model_dir ../../infer_model/ --device gpu --backend paddle
26+
```
27+
28+
运行完成后返回的结果如下:
29+
30+
```bash
31+
[2023-03-02 08:30:03,877] [ INFO] - We are using <class 'paddlenlp.transformers.bert.fast_tokenizer.BertFastTokenizer'> to load '../../infer_model/'.
32+
[INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
33+
Batch id: 0, example id: 0, sentence1: against shimmering cinematography that lends the setting the ethereal beauty of an asian landscape painting, label: positive, negative prob: 0.0003, positive prob: 0.9997.
34+
Batch id: 1, example id: 0, sentence1: the situation in a well-balanced fashion, label: positive, negative prob: 0.0002, positive prob: 0.9998.
35+
Batch id: 2, example id: 0, sentence1: at achieving the modest , crowd-pleasing goals it sets for itself, label: positive, negative prob: 0.0017, positive prob: 0.9983.
36+
Batch id: 3, example id: 0, sentence1: so pat it makes your teeth hurt, label: negative, negative prob: 0.9986, positive prob: 0.0014.
37+
Batch id: 4, example id: 0, sentence1: this new jangle of noise , mayhem and stupidity must be a serious contender for the title ., label: negative, negative prob: 0.9806, positive prob: 0.0194.
38+
```
39+
40+
## 参数说明
41+
42+
| 参数 |参数说明 |
43+
|----------|--------------|
44+
|--model_dir | 指定部署模型的目录, |
45+
|--batch_size |输入的batch size,默认为 1|
46+
|--max_length |最大序列长度,默认为 128|
47+
|--device | 运行的设备,可选范围: ['cpu', 'gpu'],默认为'cpu' |
48+
|--device_id | 运行设备的id。默认为0。 |
49+
|--cpu_threads | 当使用cpu推理时,指定推理的cpu线程数,默认为1。|
50+
|--backend | 支持的推理后端,可选范围: ['onnx_runtime', 'paddle', 'openvino', 'tensorrt', 'paddle_tensorrt'],默认为'paddle' |
51+
|--use_fp16 | 是否使用FP16模式进行推理。使用tensorrt和paddle_tensorrt后端时可开启,默认为False |
52+
|--use_fast| 是否使用FastTokenizer加速分词阶段。默认为True|
53+
54+
## FastDeploy 高阶用法
55+
56+
FastDeploy 在 Python 端上,提供 `fastdeploy.RuntimeOption.use_xxx()` 以及 `fastdeploy.RuntimeOption.use_xxx_backend()` 接口支持开发者选择不同的硬件、不同的推理引擎进行部署。在不同的硬件上部署 BERT 模型,需要选择硬件所支持的推理引擎进行部署,下表展示如何在不同的硬件上选择可用的推理引擎部署 BERT 模型。
57+
58+
符号说明: (1) ✅: 已经支持; (2) ❔: 正在进行中; (3) N/A: 暂不支持;
59+
60+
<table>
61+
<tr>
62+
<td align=center> 硬件</td>
63+
<td align=center> 硬件对应的接口</td>
64+
<td align=center> 可用的推理引擎 </td>
65+
<td align=center> 推理引擎对应的接口 </td>
66+
<td align=center> 是否支持 Paddle 新格式量化模型 </td>
67+
<td align=center> 是否支持 FP16 模式 </td>
68+
</tr>
69+
<tr>
70+
<td rowspan=3 align=center> CPU </td>
71+
<td rowspan=3 align=center> use_cpu() </td>
72+
<td align=center> Paddle Inference </td>
73+
<td align=center> use_paddle_infer_backend() </td>
74+
<td align=center> ✅ </td>
75+
<td align=center> N/A </td>
76+
</tr>
77+
<tr>
78+
<td align=center> ONNX Runtime </td>
79+
<td align=center> use_ort_backend() </td>
80+
<td align=center> ✅ </td>
81+
<td align=center> N/A </td>
82+
</tr>
83+
<tr>
84+
<td align=center> OpenVINO </td>
85+
<td align=center> use_openvino_backend() </td>
86+
<td align=center> ❔ </td>
87+
<td align=center> N/A </td>
88+
</tr>
89+
<tr>
90+
<td rowspan=4 align=center> GPU </td>
91+
<td rowspan=4 align=center> use_gpu() </td>
92+
<td align=center> Paddle Inference </td>
93+
<td align=center> use_paddle_infer_backend() </td>
94+
<td align=center> ✅ </td>
95+
<td align=center> N/A </td>
96+
</tr>
97+
<tr>
98+
<td align=center> ONNX Runtime </td>
99+
<td align=center> use_ort_backend() </td>
100+
<td align=center> ✅ </td>
101+
<td align=center> ❔ </td>
102+
</tr>
103+
<tr>
104+
<td align=center> Paddle TensorRT </td>
105+
<td align=center> use_paddle_infer_backend() + paddle_infer_option.enable_trt = True </td>
106+
<td align=center> ✅ </td>
107+
<td align=center> ✅ </td>
108+
</tr>
109+
<tr>
110+
<td align=center> TensorRT </td>
111+
<td align=center> use_trt_backend() </td>
112+
<td align=center> ✅ </td>
113+
<td align=center> ✅ </td>
114+
</tr>
115+
<tr>
116+
<td align=center> 昆仑芯 XPU </td>
117+
<td align=center> use_kunlunxin() </td>
118+
<td align=center> Paddle Lite </td>
119+
<td align=center> use_paddle_lite_backend() </td>
120+
<td align=center> N/A </td>
121+
<td align=center> ✅ </td>
122+
</tr>
123+
<tr>
124+
<td align=center> 华为 昇腾 </td>
125+
<td align=center> use_ascend() </td>
126+
<td align=center> Paddle Lite </td>
127+
<td align=center> use_paddle_lite_backend() </td>
128+
<td align=center> ❔ </td>
129+
<td align=center> ✅ </td>
130+
</tr>
131+
<tr>
132+
<td align=center> Graphcore IPU </td>
133+
<td align=center> use_ipu() </td>
134+
<td align=center> Paddle Inference </td>
135+
<td align=center> use_paddle_infer_backend() </td>
136+
<td align=center> ❔ </td>
137+
<td align=center> N/A </td>
138+
</tr>
139+
</table>
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import distutils.util
16+
import os
17+
18+
import fastdeploy as fd
19+
import numpy as np
20+
21+
from paddlenlp.transformers import AutoTokenizer
22+
23+
24+
def parse_arguments():
25+
import argparse
26+
27+
parser = argparse.ArgumentParser()
28+
parser.add_argument("--model_dir", required=True, help="The directory of model.")
29+
parser.add_argument("--vocab_path", type=str, default="", help="The path of tokenizer vocab.")
30+
parser.add_argument("--model_prefix", type=str, default="model", help="The model and params file prefix.")
31+
parser.add_argument(
32+
"--device",
33+
type=str,
34+
default="cpu",
35+
choices=["gpu", "cpu"],
36+
help="Type of inference device, support 'cpu' or 'gpu'.",
37+
)
38+
parser.add_argument(
39+
"--backend",
40+
type=str,
41+
default="paddle",
42+
choices=["onnx_runtime", "paddle", "openvino", "tensorrt", "paddle_tensorrt"],
43+
help="The inference runtime backend.",
44+
)
45+
parser.add_argument("--cpu_threads", type=int, default=1, help="Number of threads to predict when using cpu.")
46+
parser.add_argument("--device_id", type=int, default=0, help="Select which gpu device to train model.")
47+
parser.add_argument("--batch_size", type=int, default=1, help="The batch size of data.")
48+
parser.add_argument("--max_length", type=int, default=128, help="The max length of sequence.")
49+
parser.add_argument("--log_interval", type=int, default=10, help="The interval of logging.")
50+
parser.add_argument("--use_fp16", type=distutils.util.strtobool, default=False, help="Wheter to use FP16 mode")
51+
parser.add_argument(
52+
"--use_fast",
53+
type=distutils.util.strtobool,
54+
default=True,
55+
help="Whether to use fast_tokenizer to accelarate the tokenization.",
56+
)
57+
return parser.parse_args()
58+
59+
60+
def batchfy_text(texts, batch_size):
61+
batch_texts = []
62+
batch_start = 0
63+
while batch_start < len(texts):
64+
batch_texts += [texts[batch_start : min(batch_start + batch_size, len(texts))]]
65+
batch_start += batch_size
66+
return batch_texts
67+
68+
69+
class Predictor(object):
70+
def __init__(self, args):
71+
self.tokenizer = AutoTokenizer.from_pretrained(args.model_dir, use_fast=args.use_fast)
72+
self.runtime = self.create_fd_runtime(args)
73+
self.batch_size = args.batch_size
74+
self.max_length = args.max_length
75+
76+
def create_fd_runtime(self, args):
77+
option = fd.RuntimeOption()
78+
model_path = os.path.join(args.model_dir, args.model_prefix + ".pdmodel")
79+
params_path = os.path.join(args.model_dir, args.model_prefix + ".pdiparams")
80+
option.set_model_path(model_path, params_path)
81+
if args.device == "cpu":
82+
option.use_cpu()
83+
option.set_cpu_thread_num(args.cpu_threads)
84+
else:
85+
option.use_gpu(args.device_id)
86+
if args.backend == "paddle":
87+
option.use_paddle_infer_backend()
88+
elif args.backend == "onnx_runtime":
89+
option.use_ort_backend()
90+
elif args.backend == "openvino":
91+
option.use_openvino_backend()
92+
else:
93+
option.use_trt_backend()
94+
if args.backend == "paddle_tensorrt":
95+
option.use_paddle_infer_backend()
96+
option.paddle_infer_option.collect_trt_shape = True
97+
option.paddle_infer_option.enable_trt = True
98+
trt_file = os.path.join(args.model_dir, "model.trt")
99+
option.trt_option.set_shape(
100+
"input_ids", [1, 1], [args.batch_size, args.max_length], [args.batch_size, args.max_length]
101+
)
102+
option.trt_option.set_shape(
103+
"token_type_ids", [1, 1], [args.batch_size, args.max_length], [args.batch_size, args.max_length]
104+
)
105+
if args.use_fp16:
106+
option.trt_option.enable_fp16 = True
107+
trt_file = trt_file + ".fp16"
108+
option.trt_option.serialize_file = trt_file
109+
return fd.Runtime(option)
110+
111+
def preprocess(self, text):
112+
data = self.tokenizer(text, max_length=self.max_length, padding=True, truncation=True)
113+
input_ids_name = self.runtime.get_input_info(0).name
114+
token_type_ids_name = self.runtime.get_input_info(1).name
115+
input_map = {
116+
input_ids_name: np.array(data["input_ids"], dtype="int64"),
117+
token_type_ids_name: np.array(data["token_type_ids"], dtype="int64"),
118+
}
119+
return input_map
120+
121+
def infer(self, input_map):
122+
results = self.runtime.infer(input_map)
123+
return results
124+
125+
def postprocess(self, infer_data):
126+
logits = np.array(infer_data[0])
127+
max_value = np.max(logits, axis=1, keepdims=True)
128+
exp_data = np.exp(logits - max_value)
129+
probs = exp_data / np.sum(exp_data, axis=1, keepdims=True)
130+
out_dict = {"label": probs.argmax(axis=-1), "confidence": probs}
131+
return out_dict
132+
133+
def predict(self, texts):
134+
input_map = self.preprocess(texts)
135+
infer_result = self.infer(input_map)
136+
output = self.postprocess(infer_result)
137+
return output
138+
139+
140+
if __name__ == "__main__":
141+
args = parse_arguments()
142+
predictor = Predictor(args)
143+
texts_ds = [
144+
"against shimmering cinematography that lends the setting the ethereal beauty of an asian landscape painting",
145+
"the situation in a well-balanced fashion",
146+
"at achieving the modest , crowd-pleasing goals it sets for itself",
147+
"so pat it makes your teeth hurt",
148+
"this new jangle of noise , mayhem and stupidity must be a serious contender for the title .",
149+
]
150+
label_map = {0: "negative", 1: "positive"}
151+
batch_texts = batchfy_text(texts_ds, args.batch_size)
152+
for bs, texts in enumerate(batch_texts):
153+
outputs = predictor.predict(texts)
154+
for i, sentence1 in enumerate(texts):
155+
print(
156+
f"Batch id: {bs}, example id: {i}, sentence1: {sentence1}, "
157+
f"label: {label_map[outputs['label'][i]]}, negative prob: {outputs['confidence'][i][0]:.4f}, "
158+
f"positive prob: {outputs['confidence'][i][1]:.4f}."
159+
)

0 commit comments

Comments
 (0)