Skip to content

Commit 96b116e

Browse files
committed
Update UIE QAT
2 parents 4a2f42f + a6b4691 commit 96b116e

File tree

175 files changed

+19019
-1354
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

175 files changed

+19019
-1354
lines changed

README_cn.md

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -37,19 +37,6 @@
3737
* 🍭 AIGC 内容生成:新增代码生成 SOTA 模型[**CodeGen**](./examples/code_generation/codegen),支持多种编程语言代码生成;集成[**文图生成潮流模型**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALL·E Mini、Disco Diffusion、Stable Diffusion,更多趣玩模型等你来玩;新增[**中文文本摘要应用**](./applications/text_summarization),基于大规模语料的中文摘要模型首次发布,可支持 Taskflow 一键调用和定制训练;
3838
* 💪 框架升级:[**模型自动压缩 API**](./docs/compression.md) 发布,自动对模型进行裁减和量化,大幅降低模型压缩技术使用门槛;[**小样本 Prompt**](./applications/text_classification/multi_class/few-shot)能力发布,集成 PET、P-Tuning、RGL 等经典算法。
3939

40-
41-
* 👀 **2022.9.6 飞桨智慧金融行业系列直播课**
42-
43-
* 围绕深度学习技术在金融行业的产业实践与发展趋势,邀请行业内专家分享产业实践。探讨科技金融的未来发展;
44-
45-
* PaddleNLP配套课程发布产业实践范例:基于UIE的金融文件信息抽取;基于Pipelines的FAQ问答系统;
46-
47-
* **9月6日起每周二、周四19点直播**,扫码免费加入微信群获取直播链接,与行业专家深度交流:
48-
49-
<div align="center">
50-
<img src="https://user-images.githubusercontent.com/11793384/188596360-264415d4-5462-43ad-8517-5b7e690061ce.jpg" width="150" height="150" />
51-
</div>
52-
5340
* 🔥 **2022.5.16 发布 [PaddleNLP v2.3](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.3.0)**
5441
* 💎 发布通用信息抽取技术 [**UIE**](./model_zoo/uie),单模型支持实体识别、关系和事件抽取、情感分析等多种开放域信息抽取任务,不限领域和抽取目标,支持**零样本抽取**与全流程**小样本**高效定制开发;
5542
* 😊 发布文心大模型 [**ERNIE 3.0**](./model_zoo/ernie-3.0) 轻量级模型,在 [CLUE ](https://www.cluebenchmarks.com/)上实现同规模结构效果最佳,并提供**🗜️无损压缩****⚙️全场景部署**方案;
@@ -58,7 +45,7 @@
5845

5946
## 社区交流
6047

61-
- 微信扫描二维码并填写问卷之后,加入交流群领取福利
48+
- 微信扫描二维码并填写问卷,回复小助手关键词(NLP)之后,即可加入交流群领取福利
6249
- 与众多社区开发者以及官方团队深度交流。
6350
- 10G重磅NLP学习大礼包!
6451

@@ -83,6 +70,14 @@ Taskflow提供丰富的**📦开箱即用**的产业级NLP预置模型,覆盖
8370

8471
![taskflow1](https://user-images.githubusercontent.com/11793384/159693816-fda35221-9751-43bb-b05c-7fc77571dd76.gif)
8572

73+
Taskflow最新集成了文生图的趣玩应用,三行代码体验 **Stable Diffusion**
74+
```python
75+
from paddlenlp import Taskflow
76+
text_to_image = Taskflow("text_to_image", model="CompVis/stable-diffusion-v1-4")
77+
image_list = text_to_image('"In the morning light,Chinese ancient buildings in the mountains,Magnificent and fantastic John Howe landscape,lake,clouds,farm,Fairy tale,light effect,Dream,Greg Rutkowski,James Gurney,artstation"')
78+
```
79+
<img width="300" alt="image" src="https://user-images.githubusercontent.com/16698950/194882669-f7cc7c98-d63a-45f4-99c1-0514c6712368.png">
80+
8681
更多使用方法可参考[Taskflow文档](./docs/model_zoo/taskflow.md)
8782

8883
### 丰富完备的中文模型库

README_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ For more usage please refer to [Taskflow Docs](./docs/model_zoo/taskflow.md).
8181

8282
#### 🀄 Comprehensive Chinese Transformer Models
8383

84-
We provide **45+** network architectures and over **500+** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use `AutoModel` API to **⚡SUPER FAST⚡** download pretrained mdoels of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP!
84+
We provide **45+** network architectures and over **500+** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high-quality Chinese pretrained model developed by other organizations. Use `AutoModel` API to **⚡SUPER FAST⚡** download pretrained models of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP!
8585

8686
```python
8787
from paddlenlp.transformers import *

applications/neural_search/recall/in_batch_negative/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -204,9 +204,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3" \
204204
--hnsw_m 100 \
205205
--hnsw_ef 100 \
206206
--recall_num 50 \
207-
--similar_text_pair "recall/dev.csv" \
208-
--corpus_file "recall/corpus.csv" \
209-
--similar_text_pair "recall/dev.csv"
207+
--similar_text_pair_file "recall/dev.csv" \
208+
--corpus_file "recall/corpus.csv"
210209
```
211210

212211
参数含义说明
@@ -228,9 +227,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3" \
228227
* `hnsw_m`: hnsw 算法相关参数,保持默认即可
229228
* `hnsw_ef`: hnsw 算法相关参数,保持默认即可
230229
* `recall_num`: 对 1 个文本召回的相似文本数量
231-
* `similar_text_pair`: 由相似文本对构成的评估集
230+
* `similar_text_pair_file`: 由相似文本对构成的评估集
232231
* `corpus_file`: 召回库数据 corpus_file
233-
* `similar_text_pair`: 由相似文本对构成的评估集 semantic_similar_pair.tsv
234232

235233
也可以使用bash脚本:
236234

applications/question_answering/faq_finance/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -399,10 +399,24 @@ python milvus_ann_search.py --data_path data/qa_pair.csv \
399399

400400
#### Paddle Serving 部署
401401

402-
Paddle Serving 的安装可以参考[Paddle Serving 安装文档](https://github.com/PaddlePaddle/Serving#installation)。需要在服务端和客户端安装相关的依赖,安装完依赖后就可以执行下面的步骤。
402+
Paddle Serving 的安装可以参考[Paddle Serving 安装文档](https://github.com/PaddlePaddle/Serving#installation)。需要在服务端和客户端安装相关的依赖,用pip安装Paddle Serving的依赖如下:
403403

404+
```
405+
pip install paddle-serving-client==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
406+
pip install paddle-serving-app==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
407+
408+
# 如果是CPU部署,只需要安装CPU Server
409+
pip install paddle-serving-server==0.8.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
404410
405-
首先把生成的静态图模型导出为 Paddle Serving的格式,命令如下:
411+
# 如果是GPU Server,需要确认环境再选择执行哪一条,推荐使用CUDA 10.2的包
412+
# CUDA10.2 + Cudnn7 + TensorRT6(推荐)
413+
pip install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple
414+
# CUDA10.1 + TensorRT6
415+
pip install paddle-serving-server-gpu==0.8.3.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
416+
# CUDA11.2 + TensorRT8
417+
pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
418+
```
419+
更详细的安装信息请参考[链接]((https://github.com/PaddlePaddle/Serving/blob/v0.9.0/doc/Install_Linux_Env_CN.md)),安装完依赖后就可以执行下面的步骤。首先把生成的静态图模型导出为 Paddle Serving的格式,命令如下:
406420

407421
```
408422
python export_to_serving.py \

applications/question_answering/faq_finance/requirements.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,4 @@ paddlepaddle-gpu>=2.2.3
55
hnswlib>=0.5.2
66
numpy>=1.17.2
77
visualdl>=2.2.2
8-
paddle-serving-app>=0.7.0
9-
paddle-serving-client>=0.7.0
10-
paddle-serving-server-gpu>=0.7.0.post102
118
pybind11

applications/text_classification/hierarchical/deploy/paddle_serving/README.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -153,20 +153,37 @@ I0727 06:50:34.993671 43126 naive_executor.cc:102] --- skip [linear_75.tmp_1],
153153
[OP Object] init success
154154
```
155155

156-
#### 启动client测试
156+
#### 启动rpc client测试
157157
注意执行客户端请求时关闭代理,并根据实际情况修改server_url地址(启动服务所在的机器)
158158
```shell
159159
python rpc_client.py
160160
```
161161
输出打印如下:
162162
```
163-
text: 请问木竭胶囊能同高血压药、氨糖同时服吗?
164-
label: 3,37
163+
text: 消失的“外企光环”,5月份在华裁员900余人,香饽饽变“臭”了
164+
label: 组织关系,组织关系##裁员
165165
--------------------
166-
text: 低压100*高压140*头涨,想吃点降压药。谢谢!
167-
label: 0
166+
text: 卡车超载致使跨桥侧翻,没那么简单
167+
label: 灾害/意外,灾害/意外##坍/垮塌
168168
--------------------
169-
text: 脑穿通畸形易发人群有哪些
170-
label: 0,9
169+
text: 金属卡扣安装不到位,上海乐扣乐扣贸易有限公司将召回捣碎器1162件
170+
label: 产品行为,产品行为##召回
171+
--------------------
172+
```
173+
#### 启动http client测试
174+
注意执行客户端请求时关闭代理,并根据实际情况修改server_url地址(启动服务所在的机器)
175+
```shell
176+
python http_client.py
177+
```
178+
输出打印如下:
179+
```
180+
text: 消失的“外企光环”,5月份在华裁员900余人,香饽饽变“臭”了
181+
label: 组织关系,组织关系##裁员
182+
--------------------
183+
text: 卡车超载致使跨桥侧翻,没那么简单
184+
label: 灾害/意外,灾害/意外##坍/垮塌
185+
--------------------
186+
text: 金属卡扣安装不到位,上海乐扣乐扣贸易有限公司将召回捣碎器1162件
187+
label: 产品行为,产品行为##召回
171188
--------------------
172189
```

applications/text_classification/hierarchical/deploy/paddle_serving/config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
2-
rpc_port: 7688
2+
rpc_port: 18090
33

44
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
5-
http_port: 9998
5+
http_port: 9878
66

77
#worker_num, 最大并发数。
88
#当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
import numpy as np
15+
from numpy import array
16+
import requests
17+
import json
18+
import sys
19+
20+
21+
class Runner(object):
22+
23+
def __init__(
24+
self,
25+
server_url: str,
26+
):
27+
self.server_url = server_url
28+
29+
def Run(self, text, label_list):
30+
sentence = np.array([t.encode('utf-8') for t in text], dtype=np.object_)
31+
sentence = sentence.__repr__()
32+
data = {"key": ["sentence"], "value": [sentence]}
33+
data = json.dumps(data)
34+
35+
ret = requests.post(url=self.server_url, data=data)
36+
ret = ret.json()
37+
for t, l in zip(text, eval(ret['value'][0])):
38+
print("text: ", t)
39+
label = ','.join([label_list[int(ll)] for ll in l.split(',')])
40+
print("label: ", label)
41+
print("--------------------")
42+
return
43+
44+
45+
if __name__ == "__main__":
46+
server_url = "http://127.0.0.1:9878/seq_cls/prediction"
47+
runner = Runner(server_url)
48+
text = [
49+
"消失的“外企光环”,5月份在华裁员900余人,香饽饽变“臭”了?", "卡车超载致使跨桥侧翻,没那么简单",
50+
"金属卡扣安装不到位,上海乐扣乐扣贸易有限公司将召回捣碎器1162件"
51+
]
52+
label_list = [
53+
'交往', '交往##会见', '交往##感谢', '交往##探班', '交往##点赞', '交往##道歉', '产品行为',
54+
'产品行为##上映', '产品行为##下架', '产品行为##发布', '产品行为##召回', '产品行为##获奖', '人生',
55+
'人生##产子/女', '人生##出轨', '人生##分手', '人生##失联', '人生##婚礼', '人生##庆生', '人生##怀孕',
56+
'人生##死亡', '人生##求婚', '人生##离婚', '人生##结婚', '人生##订婚', '司法行为', '司法行为##举报',
57+
'司法行为##入狱', '司法行为##开庭', '司法行为##拘捕', '司法行为##立案', '司法行为##约谈', '司法行为##罚款',
58+
'司法行为##起诉', '灾害/意外', '灾害/意外##地震', '灾害/意外##坍/垮塌', '灾害/意外##坠机',
59+
'灾害/意外##洪灾', '灾害/意外##爆炸', '灾害/意外##袭击', '灾害/意外##起火', '灾害/意外##车祸', '竞赛行为',
60+
'竞赛行为##夺冠', '竞赛行为##晋级', '竞赛行为##禁赛', '竞赛行为##胜负', '竞赛行为##退役', '竞赛行为##退赛',
61+
'组织关系', '组织关系##停职', '组织关系##加盟', '组织关系##裁员', '组织关系##解散', '组织关系##解约',
62+
'组织关系##解雇', '组织关系##辞/离职', '组织关系##退出', '组织行为', '组织行为##开幕', '组织行为##游行',
63+
'组织行为##罢工', '组织行为##闭幕', '财经/交易', '财经/交易##上市', '财经/交易##出售/收购',
64+
'财经/交易##加息', '财经/交易##涨价', '财经/交易##涨停', '财经/交易##融资', '财经/交易##跌停',
65+
'财经/交易##降价', '财经/交易##降息'
66+
]
67+
runner.Run(text, label_list)

applications/text_classification/hierarchical/deploy/paddle_serving/rpc_client.py

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,37 @@ def __init__(
2626
self.client = PipelineClient()
2727
self.client.connect([server_url])
2828

29-
def Run(self, data):
29+
def Run(self, data, label_list):
3030
data = np.array([x.encode('utf-8') for x in data], dtype=np.object_)
3131
ret = self.client.predict(feed_dict={"sentence": data})
3232
for d, l, in zip(data, eval(ret.value[0])):
3333
print("text: ", d)
34-
print("label: ", l)
34+
label = ','.join([label_list[int(ll)] for ll in l.split(',')])
35+
print("label: ", label)
3536
print("--------------------")
3637
return
3738

3839

3940
if __name__ == "__main__":
40-
server_url = "127.0.0.1:7688"
41+
server_url = "127.0.0.1:18090"
4142
runner = Runner(server_url)
42-
texts = [
43+
text = [
4344
"消失的“外企光环”,5月份在华裁员900余人,香饽饽变“臭”了?", "卡车超载致使跨桥侧翻,没那么简单",
4445
"金属卡扣安装不到位,上海乐扣乐扣贸易有限公司将召回捣碎器1162件"
4546
]
46-
runner.Run(texts)
47+
label_list = [
48+
'交往', '交往##会见', '交往##感谢', '交往##探班', '交往##点赞', '交往##道歉', '产品行为',
49+
'产品行为##上映', '产品行为##下架', '产品行为##发布', '产品行为##召回', '产品行为##获奖', '人生',
50+
'人生##产子/女', '人生##出轨', '人生##分手', '人生##失联', '人生##婚礼', '人生##庆生', '人生##怀孕',
51+
'人生##死亡', '人生##求婚', '人生##离婚', '人生##结婚', '人生##订婚', '司法行为', '司法行为##举报',
52+
'司法行为##入狱', '司法行为##开庭', '司法行为##拘捕', '司法行为##立案', '司法行为##约谈', '司法行为##罚款',
53+
'司法行为##起诉', '灾害/意外', '灾害/意外##地震', '灾害/意外##坍/垮塌', '灾害/意外##坠机',
54+
'灾害/意外##洪灾', '灾害/意外##爆炸', '灾害/意外##袭击', '灾害/意外##起火', '灾害/意外##车祸', '竞赛行为',
55+
'竞赛行为##夺冠', '竞赛行为##晋级', '竞赛行为##禁赛', '竞赛行为##胜负', '竞赛行为##退役', '竞赛行为##退赛',
56+
'组织关系', '组织关系##停职', '组织关系##加盟', '组织关系##裁员', '组织关系##解散', '组织关系##解约',
57+
'组织关系##解雇', '组织关系##辞/离职', '组织关系##退出', '组织行为', '组织行为##开幕', '组织行为##游行',
58+
'组织行为##罢工', '组织行为##闭幕', '财经/交易', '财经/交易##上市', '财经/交易##出售/收购',
59+
'财经/交易##加息', '财经/交易##涨价', '财经/交易##涨停', '财经/交易##融资', '财经/交易##跌停',
60+
'财经/交易##降价', '财经/交易##降息'
61+
]
62+
runner.Run(text, label_list)

applications/text_classification/hierarchical/few-shot/infer.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -178,9 +178,12 @@ def preprocess(self, input_data: list):
178178
text = [InputExample(text_a=x) for x in input_data]
179179
inputs = [self._template.wrap_one_example(x) for x in text]
180180
inputs = {
181-
"input_ids": np.array([x["input_ids"] for x in inputs]),
182-
"mask_ids": np.array([x["mask_ids"] for x in inputs]),
183-
"soft_token_ids": np.array([x["soft_token_ids"] for x in inputs])
181+
"input_ids":
182+
np.array([x["input_ids"] for x in inputs], dtype="int64"),
183+
"mask_ids":
184+
np.array([x["mask_ids"] for x in inputs], dtype="int64"),
185+
"soft_token_ids":
186+
np.array([x["soft_token_ids"] for x in inputs], dtype="int64")
184187
}
185188
return inputs
186189

0 commit comments

Comments
 (0)