Skip to content

Commit 5f5cb52

Browse files
authored
Add unified checkpoint training args doc (#7756)
1 parent a6968b7 commit 5f5cb52

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/trainer.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
trainer.md
12
# PaddleNLP Trainer API
23

34
PaddleNLP提供了Trainer训练API,针对训练过程的通用训练配置做了封装,比如:
@@ -661,6 +662,27 @@ Trainer 是一个简单,但功能完整的 Paddle训练和评估模块,并
661662
The path to a folder with a valid checkpoint for your
662663
model. (default: None)
663664
665+
--unified_checkpoint
666+
是否统一混合并行训练的Checkpoint,(可选,默认为False)
667+
Whether to unify hybrid parallel checkpoint. (default: False)
668+
669+
--unified_checkpoint_config
670+
与Unified Checkpoint相关的一些优化配置项,以str形式传入配置。
671+
支持如下选项:
672+
skip_save_model_weight: 当master_weights存在时,跳过保存模型权重。
673+
master_weight_compatible: 1. 仅当optimizer需要master_weights时,才进行加载;
674+
2. 如果checkpoint中不存在master_weights,则将model weight作为master_weights进行加载。
675+
async_save: 在保存Checkpoint至磁盘时做异步保存,不影响训练过程,提高训练效率。
676+
enable_all_options: 上述参数全部开启。
677+
678+
Some additional config of Unified checkpoint, we provide some options to config.
679+
Following config is support:
680+
skip_save_model_weight, no need to save model weights when the master_weights exist.
681+
master_weight_compatible, 1. if the master_weights exist, only load when needed.
682+
2. if master_weights does not exist, convert model weights to master_weights when needed.
683+
async_save, enable asynchronous saving checkpoints to disk.
684+
enable_all_options, enable all unified checkpoint optimization configs.
685+
664686
--skip_memory_metrics
665687
是否跳过内存profiler检测。(可选,默认为True,跳过)
666688
Whether or not to skip adding of memory profiler reports

0 commit comments

Comments
 (0)