Skip to content

Commit a535904

Browse files
committed
Merge remote-tracking branch 'upstream/develop' into fix_model_download
2 parents f4a0340 + 82a303f commit a535904

File tree

30 files changed

+866
-81
lines changed

30 files changed

+866
-81
lines changed
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
name: FastTokenizer
2+
3+
on:
4+
push:
5+
paths:
6+
- 'fast_tokenizer/*'
7+
pull_request:
8+
paths:
9+
- 'fast_tokenizer/*'
10+
11+
jobs:
12+
fast_tokenizer_cpp:
13+
name: fast_tokenizer_cpp
14+
runs-on: ubuntu-22.04
15+
permissions:
16+
pull-requests: write
17+
contents: read
18+
id-token: write
19+
steps:
20+
- uses: actions/checkout@v3
21+
- name: compile
22+
working-directory: ./fast_tokenizer
23+
run: make fast_tokenizer_cpp_compile
24+
- name: test
25+
working-directory: ./fast_tokenizer
26+
run: make fast_tokenizer_cpp_test
27+
fast_tokenizer_python38:
28+
name: fast_tokenizer_python38
29+
runs-on: ubuntu-22.04
30+
permissions:
31+
pull-requests: write
32+
contents: read
33+
id-token: write
34+
steps:
35+
- uses: actions/checkout@v3
36+
- uses: actions/setup-python@v1
37+
with:
38+
python-version: 3.8
39+
- name: install
40+
working-directory: ./fast_tokenizer
41+
run: make fast_tokenizer_python_install
42+
- name: compile
43+
working-directory: ./fast_tokenizer
44+
run: make fast_tokenizer_python_compile
45+
- name: test
46+
working-directory: ./fast_tokenizer
47+
run: make fast_tokenizer_python_test
48+

.readthedocs.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ sphinx:
2121
python:
2222
version: 3.8
2323
install:
24-
- requirements: docs/requirements.txt
2524
- requirements: requirements.txt
25+
- requirements: docs/requirements.txt
2626
- method: setuptools
2727
path: .
28-
system_packages: true
28+
system_packages: true

applications/information_extraction/label_studio_text.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,10 @@ label-studio start
5656
<img src=https://user-images.githubusercontent.com/40840292/199661638-48a870eb-a1df-4db5-82b9-bc8e985f5190.png height=350 width=1200 />
5757
</div>
5858

59-
- **文本分类、句子级情感倾向分类**任务选择``Relation Extraction`。
59+
- **文本分类、句子级情感倾向分类**任务选择``Text Classification``
6060

6161
<div align="center">
62-
<img src=https://user-images.githubusercontent.com/40840292/199661638-48a870eb-a1df-4db5-82b9-bc8e985f5190.png height=350 width=1200 />
62+
<img src=https://user-images.githubusercontent.com/40840292/212617773-34534e68-4544-4b24-8f39-ae7f9573d397.png height=420 width=1200 />
6363
</div>
6464

6565
- 添加标签(也可跳过后续在Setting/Labeling Interface中配置)

applications/information_extraction/label_studio_text_en.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,10 @@ Click Create to start creating a new project, fill in the project name, descript
5757
<img src=https://user-images.githubusercontent.com/40840292/199661638-48a870eb-a1df-4db5-82b9-bc8e985f5190.png height=350 width=1200 />
5858
</div>
5959

60-
- For **Text classification, Sentence-level sentiment classification** tasks please select ``Relation Extraction`.
60+
- For **Text classification, Sentence-level sentiment classification** tasks please select ``Text Classification``.
6161

6262
<div align="center">
63-
<img src=https://user-images.githubusercontent.com/40840292/199661638-48a870eb-a1df-4db5-82b9-bc8e985f5190.png height=350 width=1200 />
63+
<img src=https://user-images.githubusercontent.com/40840292/212617773-34534e68-4544-4b24-8f39-ae7f9573d397.png height=420 width=1200 />
6464
</div>
6565

6666
- Define labels

applications/zero_shot_text_classification/README.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
简体中文 | [English](README_en.md)
2+
13
# 零样本文本分类
24

35
**目录**
@@ -27,7 +29,7 @@
2729
**零样本文本分类应用亮点:**
2830

2931
- **覆盖场景全面🎓:** 覆盖文本分类各类主流任务,支持多任务训练,满足开发者多样文本分类落地需求。
30-
- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。
32+
- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。该模型在[ZeroCLUE](https://www.cluebenchmarks.com/zeroclue.html)[FewCLUE](https://www.cluebenchmarks.com/fewclue.html)均取得榜首(截止2023年1月11日)。
3133
- **简单易用:** 通过Taskflow实现三行代码可实现无标注数据的情况下进行快速调用,一行命令即可开启文本分类,轻松完成部署上线,降低多任务文本分类落地门槛。
3234
- **高效调优✊:** 开发者无需机器学习背景知识,即可轻松上手数据标注及模型训练流程。
3335

@@ -114,7 +116,8 @@ python run_train.py \
114116
--disable_tqdm True \
115117
--metric_for_best_model macro_f1 \
116118
--load_best_model_at_end True \
117-
--save_total_limit 1
119+
--save_total_limit 1 \
120+
--save_plm
118121
```
119122

120123
如果在GPU环境中使用,可以指定gpus参数进行多卡训练:
@@ -143,7 +146,8 @@ python -u -m paddle.distributed.launch --gpus "0,1" run_train.py \
143146
--disable_tqdm True \
144147
--metric_for_best_model macro_f1 \
145148
--load_best_model_at_end True \
146-
--save_total_limit 1
149+
--save_total_limit 1 \
150+
--save_plm
147151
```
148152

149153
该示例代码中由于设置了参数 `--do_eval`,因此在训练完会自动进行评估。
@@ -156,7 +160,9 @@ python -u -m paddle.distributed.launch --gpus "0,1" run_train.py \
156160
* `seed`:全局随机种子,默认为 42。
157161
* `model_name_or_path`:进行 few shot 训练使用的预训练模型。默认为 "utc-large"。
158162
* `output_dir`:必须,模型训练或压缩后保存的模型目录;默认为 `None`
159-
* `dev_path`:开发集路径;默认为 `None`
163+
* `dataset_path`:数据集文件所在目录;默认为 `./data/`
164+
* `train_file`:训练集后缀;默认为 `train.txt`
165+
* `dev_file`:开发集后缀;默认为 `dev.txt`
160166
* `max_seq_len`:文本最大切分长度,包括标签的输入超过最大长度时会对输入文本进行自动切分,标签部分不可切分,默认为512。
161167
* `per_device_train_batch_size`:用于训练的每个 GPU 核心/CPU 的batch大小,默认为8。
162168
* `per_device_eval_batch_size`:用于评估的每个 GPU 核心/CPU 的batch大小,默认为8。
@@ -204,7 +210,7 @@ python run_eval.py \
204210
>>> from pprint import pprint
205211
>>> from paddlenlp import Taskflow
206212
>>> schema = ["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议", "疾病表述", "后果表述", "注意事项", "功效作用", "医疗费用", "其他"]
207-
>>> my_cls = Taskflow("zero_shot_text_classification", schema=schema, task_path='./checkpoint/model_best', precision="fp16")
213+
>>> my_cls = Taskflow("zero_shot_text_classification", schema=schema, task_path='./checkpoint/model_best/plm', precision="fp16")
208214
>>> pprint(my_cls("中性粒细胞比率偏低"))
209215
```
210216

@@ -221,7 +227,7 @@ from paddlenlp import SimpleServer, Taskflow
221227
schema = ["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议"]
222228
utc = Taskflow("zero_shot_text_classification",
223229
schema=schema,
224-
task_path="../../checkpoint/model_best/",
230+
task_path="../../checkpoint/model_best/plm",
225231
precision="fp32")
226232
app = SimpleServer()
227233
app.register_taskflow("taskflow/utc", utc)

0 commit comments

Comments
 (0)