You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paddlenlp/trainer/training_args.py
+6-1Lines changed: 6 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -271,7 +271,7 @@ class TrainingArguments:
271
271
enable_stage1_broadcast_overlap, overlap stage1 V1 broadcast with next step forward computation. There are some constraints for the overlap, such as the logging_step should be bigger than 1 for broadcast overlap forward compute and no other sync could be called during the training for broadcast overlap.
272
272
enable_stage1_allgather_overlap, overlap stage1 V2 allgather with next step forward computation. There are some constraints for the overlap, such as the logging_step should be bigger than 1 for allgather overlap forward compute and no other sync could be called during the training for allgather overlap.
273
273
disable_stage1_reduce_avg, replace reduce_avg with original reduce_sum+scale in stage1, which can be used for accuracy verification.
274
-
enable_release_graHEADds, reduce peak memory usage by releasing gradients after each iteration. The creation of gradients will be postponed until backward propagation of the next iteration.
274
+
enable_release_grads, reduce peak memory usage by releasing gradients after each iteration. The creation of gradients will be postponed until backward propagation of the next iteration.
275
275
recompute (`bool`, *optional*, defaults to `False`):
276
276
Recompute the forward pass to calculate gradients. Used for saving memory.
277
277
Only support for networks with transformer blocks.
@@ -355,6 +355,8 @@ class TrainingArguments:
355
355
Whether skip profile timer, timer will record time usage of forward/ backward/ step, etc.
356
356
distributed_dataloader (`bool`, *optional*):
357
357
Whether to use distributed dataloader. Default is `False`.
358
+
release_grads (`bool`, *optional*):
359
+
Whether to release gradients during training. Default is `False`.
358
360
"""
359
361
360
362
output_dir: str=field(
@@ -832,6 +834,9 @@ class TrainingArguments:
832
834
default=False,
833
835
metadata={"help": "Enable MoE (Mixture of Experts) expert parallel training"},
834
836
)
837
+
release_grads: Optional[bool] =field(
838
+
default=False, metadata={"help": "Whether to release gradients during training. Default is `False`."}
0 commit comments