generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
🏋 SFTRelated to SFTRelated to SFT📚 documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Hi!
To my knowledge, it seems like there are two DataCollators available that can handle, one, packed examples and, two, padding-free attention.
custom LanguageModelling in sft_trainer.py:
trl/trl/trainer/sft_trainer.py
Line 109 in 686cd35
class DataCollatorForLanguageModeling(DataCollatorMixin): |
Based on documentation it seems like WithFlattening has a nice feature of preventing the last token of a packed example from predicting the first token of the next example. Otherwise, not sure what the differences from an initial reading of the code.
Metadata
Metadata
Assignees
Labels
🏋 SFTRelated to SFTRelated to SFT📚 documentationImprovements or additions to documentationImprovements or additions to documentation