You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paddlenlp/trainer/training_args.py
+12-7Lines changed: 12 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -236,6 +236,7 @@ class TrainingArguments:
236
236
Some additional config it highly affect the useage of sharding parallel, we provide some option to config it.
237
237
following config is support:
238
238
enable_stage1_tensor_fusion, fuse small tensors into big tensor chunks to accelerate communications, may increase memory occupation
239
+
enable_stage1_overlap, fuse small tensors into big tensor chunks to accelerate communications and do communication overlap with backward computation, may harm the backward speed
239
240
recompute (`bool`, *optional*, defaults to `False`):
240
241
Recompute the forward pass to calculate gradients. Used for saving memory.
241
242
Only support for networks with transformer blocks.
@@ -541,7 +542,8 @@ class TrainingArguments:
541
542
"help": (
542
543
"Some additional config it highly affect the useage of sharding parallel, we provide some option to config it."
543
544
"following config is support: \n"
544
-
"enable_stage1_tensor_fusion, fuse small tensors into big tensor chunks to accelerate communications, may increase memory occupation"
545
+
"enable_stage1_tensor_fusion, fuse small tensors into big tensor chunks to accelerate communications, may increase memory occupation\n"
546
+
"enable_stage1_overlap, fuse small tensors into big tensor chunks to accelerate communications and do communication overlap with backward computation, may harm the backward speed"
0 commit comments