You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paddlenlp/trainer/training_args.py
+7-2Lines changed: 7 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -272,6 +272,7 @@ class TrainingArguments:
272
272
enable_stage1_allgather_overlap, overlap stage1 V2 allgather with next step forward computation. There are some constraints for the overlap, such as the logging_step should be bigger than 1 for allgather overlap forward compute and no other sync could be called during the training for allgather overlap.
273
273
disable_stage1_reduce_avg, replace reduce_avg with original reduce_sum+scale in stage1, which can be used for accuracy verification.
274
274
enable_release_graHEADds, reduce peak memory usage by releasing gradients after each iteration. The creation of gradients will be postponed until backward propagation of the next iteration.
275
+
enable_fuse_optimizer_states, fuse optimizer states to a single storage.
275
276
recompute (`bool`, *optional*, defaults to `False`):
276
277
Recompute the forward pass to calculate gradients. Used for saving memory.
277
278
Only support for networks with transformer blocks.
0 commit comments