Skip to content

Which collator to use for padding-free packing: WithFlattening or Default LanguageModelling? #3692

@jiosephlee

Description

@jiosephlee

Hi!

To my knowledge, it seems like there are two DataCollators available that can handle, one, packed examples and, two, padding-free attention.

WithFlattening: https://github.com/huggingface/transformers/blob/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39/src/transformers/data/data_collator.py#L1973

custom LanguageModelling in sft_trainer.py:

class DataCollatorForLanguageModeling(DataCollatorMixin):

Based on documentation it seems like WithFlattening has a nice feature of preventing the last token of a packed example from predicting the first token of the next example. Otherwise, not sure what the differences from an initial reading of the code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    🏋 SFTRelated to SFT📚 documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions