Skip to content

Commit 37a6860

Browse files
authored
[DOC] Add ernie-1.0-base-zh-cw benchmark results. (#3248)
1 parent 8fc38d6 commit 37a6860

File tree

5 files changed

+96
-16
lines changed

5 files changed

+96
-16
lines changed

examples/benchmark/clue/README.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
</tr> <tr>
7171
<td rowspan=3 align=center> 24L1024H </td>
7272
<td style="text-align:center">
73-
<span style="font-size:18px">ERNIE 1.0-Large-zh-CW</span>
73+
<span style="font-size:18px">ERNIE 1.0-Large-zh-cw</span>
7474
</td>
7575
<td style="text-align:center">
7676
<span style="font-size:18px"><b>79.03</b></span>
@@ -222,7 +222,7 @@
222222
</td>
223223
</tr>
224224
<tr>
225-
<td rowspan=8 align=center> 12L768H </td>
225+
<td rowspan=9 align=center> 12L768H </td>
226226
<td style="text-align:center">
227227
<span style="font-size:18px">
228228
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
@@ -264,6 +264,44 @@
264264
<span style="font-size:18px"><b>77.88</b></span>
265265
</td>
266266
</tr>
267+
<tr>
268+
<td style="text-align:center">
269+
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
270+
</td>
271+
<td style="text-align:center">
272+
<span style="font-size:18px">76.47</span>
273+
</td>
274+
<td style="text-align:center">
275+
<span style="font-size:18px">76.07</span>
276+
</td>
277+
<td style="text-align:center">
278+
<span style="font-size:18px">57.86</span>
279+
</td>
280+
<td style="text-align:center">
281+
<span style="font-size:18px">59.91</span>
282+
</td>
283+
<td style="text-align:center">
284+
<span style="font-size:18px">83.41</span>
285+
</td>
286+
<td style="text-align:center">
287+
<span style="font-size:18px">79.58</span>
288+
</td>
289+
<td style="text-align:center">
290+
<span style="font-size:18px">89.91</span>
291+
</td>
292+
<td style="text-align:center">
293+
<span style="font-size:18px">83.42</span>
294+
</td>
295+
<td style="text-align:center">
296+
<span style="font-size:18px">72.88/90.78</span>
297+
</td>
298+
<td style="text-align:center">
299+
<span style="font-size:18px">84.68</span>
300+
</td>
301+
<td style="text-align:center">
302+
<span style="font-size:18px">76.98</span>
303+
</td>
304+
</tr>
267305
<tr>
268306
<td style="text-align:center">
269307
<span style="font-size:18px">ERNIE-Gram-zh</span>
@@ -1196,6 +1234,7 @@ AFQMC(语义相似度)、TNEWS(文本分类)、IFLYTEK(长文本分类
11961234
| ERNIE 2.0-Large-zh | 1e-5,32 | 3e-5,64 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 3e-5,32 | 1e-5,64 | 2e-5,24 | 2e-5,24 | 3e-5,32 |
11971235
| HFL/RoBERTa-wwm-ext-large | 1e-5,32 | 3e-5,32 | 2e-5,32 | 1e-5,16 | 1e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,32 | 1e-5,24 | 2e-5,24 |
11981236
| ERNIE 3.0-Base-zh | 3e-5,16 | 3e-5,32 | 5e-5,32 | 3e-5,32 | 2e-5,64 | 2e-5,16 | 2e-5,32 | 2e-5,24 | 3e-5,24 | 3e-5,32 |
1237+
| ERNIE 1.0-Base-zh-cw | 2e-5,16 | 3e-5,32 | 5e-5,16 | 2e-5,16 | 3e-5,32 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 2e-5,32 | 3e-5,24 |
11991238
| ERNIE-Gram-zh | 1e-5,16 | 5e-5,16 | 5e-5,16 | 2e-5,32 | 2e-5,64 | 3e-5,16 | 3e-5,64 | 3e-5,32 | 2e-5,24 | 2e-5,24 |
12001239
| ERNIE 2.0-Base-zh | 3e-5,64 | 3e-5,64 | 5e-5,16 | 5e-5,64 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,32 | 3e-5,24 | 3e-5,32 |
12011240
| Langboat/Mengzi-Bert-Base | 3e-5,32 | 5e-5,32 | 5e-5,16 | 2e-5,16 | 2e-5,16 | 3e-5,8 | 1e-5,16 | 3e-5,24 | 3e-5,24 | 2e-5,32 |

model_zoo/ernie-1.0/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -484,24 +484,24 @@ python3 -u -m paddle.distributed.launch \
484484

485485
我们release了base、large两个模型。均取得了较好的预训练效果。
486486

487-
- **ERNIE 1.0-Base-zh-CW** 模型:
487+
- **ERNIE 1.0-Base-zh-cw** 模型:
488488
- 使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:
489489

490490
Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
491491
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
492492
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc | Acc
493-
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
493+
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.07 | 57.86 | 59.91 | 83.41 | 79.91 | 89.91 | <b>83.42</b> | 72.88/90.78 | <b>84.68</b> | 76.98 |
494494
ERNIE 2.0-Base-zh | 12L768H | 74.95 | 76.25 | 58.53 | 61.72 | 83.07 | 78.81 | 84.21 | 82.77 | 68.22/88.71 | 82.78 | 73.19
495495
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68
496496
-
497-
- **ERNIE 1.0-Large-zh-CW** 模型:
497+
- **ERNIE 1.0-Large-zh-cw** 模型:
498498

499499
- 除了base模型外,我们还训练了放出了large模型。此模型参数采用的是词表与ernie-1.0相同,因此命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失:
500500

501501
Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
502502
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
503503
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
504-
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
504+
ERNIE 1.0-Large-zh-cw | 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
505505
ERNIE 3.0-Xbase-zh| 20L1024H | 78.71 | 76.85 | 59.89 | 62.41 | 84.76 | 82.51 | 89.80 | 84.47 | 75.49/92.67 | 86.36 | 84.59
506506
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26
507507

model_zoo/ernie-1.0/pretraining_introduction.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ PaddleNLP致力于预训练开源工作,使用开源中文语料CLUE、WuDao
2424
- [3.4 训练数据流配置](#data_pipe)
2525
- [3.5 观察评估](#观察评估)
2626
- [4. 训练效果](#release_models)
27-
- [4.1 ERNIE 1.0-Base-zh-CW 模型](#ernie-1.0-base-zh-cw)
28-
- [4.2 ERNIE 1.0-Large-zh-CW 模型](#ernie-1.0-large-zh-cw)
27+
- [4.1 ERNIE 1.0-Base-zh-cw 模型](#ernie-1.0-base-zh-cw)
28+
- [4.2 ERNIE 1.0-Large-zh-cw 模型](#ernie-1.0-large-zh-cw)
2929
* [5. 参考](#references)
3030

3131
全部流程介绍图如下:
@@ -577,28 +577,28 @@ python3 -u -m paddle.distributed.launch \
577577

578578
<a name="ernie-1.0-base-zh-cw"></a>
579579

580-
### 4.1 ERNIE 1.0-Base-zh-CW 模型
580+
### 4.1 ERNIE 1.0-Base-zh-cw 模型
581581

582582
使用CLUE,WuDao共计400GB的语料,batch_size 1024, 训练 400w step,即可训练得到`ernie-3.0-base-zh`类似的模型效果。相关模型参数,开源为`ernie-1.0-base-zh-cw`,用户加载即可使用。使用CLUE benchmark 对最优超参数进行GradSearch搜索:
583583

584584
Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
585585
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
586586
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1| Acc| Acc
587-
ERNIE 1.0-Base-zh-CW | 12L768H | <b>76.44</b> | 76.04 | 58.02 | 60.87 | 83.56 | 78.61 | 89.14 | 84.00 | 72.26/90.40 | 84.73 | 77.15 |
587+
ERNIE 1.0-Base-zh-cw | 12L768H | <b>76.47</b> | 76.04 | 57.86 | 59.91 | <b>83.41</b> | 79.58 | 89.91 | 83.42 | 72.88/90.78 | <b>84.68</b> | 76.98 |
588588
ERNIE 2.0-Base-zh | 12L768H | 74.32 | 75.65 | 58.25 | 61.64 | 82.62 | 78.71 | 81.91 | 82.33 | 66.08/87.46 | 82.78 | 73.19
589589
ERNIE 1.0-Base-zh | 12L768H | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68
590590

591591

592592
<a name="ernie-1.0-large-zh-cw"> </a>
593593

594-
### 4.2 ERNIE 1.0-Large-zh-CW 模型
594+
### 4.2 ERNIE 1.0-Large-zh-cw 模型
595595

596596
除了base模型外,我们还训练了large模型。命名为`ernie-1.0-large-zh-cw`。使用开源语料,batch_size 512, 训练 400w step,训练去除SOP任务,只保留MLM损失,使用CLUE benchmark 对最优超参数进行GradSearch搜索:
597597

598598
Model&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Arch | CLUE AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUE WSC2020 | CSL | CMRC | CHID | C3
599599
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
600600
Metrics |   |   | Acc | Acc | Acc | Acc | Acc | Acc | Acc | Exact/F1 | Acc| Acc
601-
ERNIE 1.0-Large-zh-CW| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
601+
ERNIE 1.0-Large-zh-cw| 24L1024H | <b>79.03</b> | 75.97 | 59.65 | 62.91 | 85.09 | 81.73| 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54
602602
ERNIE 3.0-Xbase-zh| 20L1024H | 78.39 | 76.16 | 59.55 | 61.87 | 84.40 | 81.73 | 88.82 | 83.60 | 75.99/93.00 | 86.78 | 84.98
603603
RoBERTa-wwm-ext-large | 24L1024H | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26
604604

model_zoo/ernie-1.0/run_pretrain.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -541,8 +541,11 @@ def do_train(args):
541541
ctx_manager = contextlib.nullcontext() if sys.version_info >= (
542542
3, 7) else contextlib.suppress()
543543

544-
if worker_num > 1 and (args.use_recompute
545-
or args.accumulate_steps > 1):
544+
if worker_num > 1 and (args.use_recompute or
545+
((step + 1) % args.accumulate_steps != 0)):
546+
# grad acc, no_sync when (step + 1) % args.accumulate_steps != 0:
547+
# recompute, no_sync every where
548+
# recompute + grad_acc, no_sync every where
546549
ctx_manager = model.no_sync()
547550
else:
548551
ctx_manager = contextlib.nullcontext() if sys.version_info >= (

model_zoo/ernie-3.0/README.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
139139
<tr>
140140
<td rowspan=3 align=center> 24L1024H </td>
141141
<td style="text-align:center">
142-
<span style="font-size:18px">ERNIE 1.0-Large-CW</span>
142+
<span style="font-size:18px">ERNIE 1.0-Large-cw</span>
143143
</td>
144144
<td style="text-align:center">
145145
<span style="font-size:18px"><b>79.03</b></span>
@@ -291,7 +291,7 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
291291
</td>
292292
</tr>
293293
<tr>
294-
<td rowspan=8 align=center> 12L768H </td>
294+
<td rowspan=9 align=center> 12L768H </td>
295295
<td style="text-align:center">
296296
<span style="font-size:18px">
297297
<a href="https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams">
@@ -333,6 +333,44 @@ batch_size=32 和 1,预测精度为 FP16 时,GPU 下的效果-时延图:
333333
<span style="font-size:18px"><b>77.88</b></span>
334334
</td>
335335
</tr>
336+
<tr>
337+
<td style="text-align:center">
338+
<span style="font-size:18px">ERNIE 1.0-Base-zh-cw</span>
339+
</td>
340+
<td style="text-align:center">
341+
<span style="font-size:18px">76.47</span>
342+
</td>
343+
<td style="text-align:center">
344+
<span style="font-size:18px">76.07</span>
345+
</td>
346+
<td style="text-align:center">
347+
<span style="font-size:18px">57.86</span>
348+
</td>
349+
<td style="text-align:center">
350+
<span style="font-size:18px">59.91</span>
351+
</td>
352+
<td style="text-align:center">
353+
<span style="font-size:18px">83.41</span>
354+
</td>
355+
<td style="text-align:center">
356+
<span style="font-size:18px">79.58</span>
357+
</td>
358+
<td style="text-align:center">
359+
<span style="font-size:18px">89.91</span>
360+
</td>
361+
<td style="text-align:center">
362+
<span style="font-size:18px">83.42</span>
363+
</td>
364+
<td style="text-align:center">
365+
<span style="font-size:18px">72.88/90.78</span>
366+
</td>
367+
<td style="text-align:center">
368+
<span style="font-size:18px">84.68</span>
369+
</td>
370+
<td style="text-align:center">
371+
<span style="font-size:18px">76.98</span>
372+
</td>
373+
</tr>
336374
<tr>
337375
<td style="text-align:center">
338376
<span style="font-size:18px">ERNIE-Gram-zh</span>

0 commit comments

Comments
 (0)