Skip to content

Commit eb06992

Browse files
committed
add tip
1 parent 74c3b55 commit eb06992

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

docs/source/reducing_memory_usage.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ To enable activation offloading in your SFT training configuration:
141141
```python
142142
from trl import SFTConfig
143143

144-
training_args = SFTConfig(..., activation_offloading=True)
144+
training_args = SFTConfig(..., activation_offloading=True, activation_checkpointing=True)
145145
```
146146

147147
</hfoption>
@@ -151,13 +151,16 @@ training_args = SFTConfig(..., activation_offloading=True)
151151

152152
When using activation offloading with pre-initialized models that use Liger kernels, you must disable Liger cross entropy due to compatibility issues. The issue occurs specifically with `use_liger_kernel=True` because Liger cross entropy performs in-place operations which conflict with activation offloading. The default setting (`use_liger_kernel=False`) may work properly.
153153

154+
We recommend using activation offloading together with activation checkpointing, in which case only checkpointed tensors will be offloaded.
155+
154156
```python
155157
# When using activation offloading with a model that uses Liger kernels:
156158
from trl import SFTConfig
157159

158160
training_args = SFTConfig(
159161
activation_offloading=True,
160162
use_liger_kernel=False, # Disable Liger cross entropy
163+
activation_checkpointing=True,
161164
# Other parameters...
162165
)
163166
```

0 commit comments

Comments
 (0)