Skip to content

🐇 [Research] Layer Skip SFT #3111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 24, 2025

Conversation

ariG23498
Copy link
Collaborator

We have used unsloth/Llama-3.2-3B and applied Layer Skip SFT on it with the https://huggingface.co/datasets/WillHeld/top_v2 dataset. You can find the fine tuned model here.

Benchmark Results:

Running the throughput benchmark script we get the following:

[------ Generation Speeds -------]
                     |  generation
16 threads: ----------------------
      no layer skip  |    522.1   
      layer skip 1   |    568.9   
      layer skip 2   |    353.0   
      layer skip 3   |    188.3   
      layer skip 4   |    170.6   
      layer skip 5   |    186.7   
      layer skip 6   |    203.2   
      layer skip 7   |    219.0   
      layer skip 8   |    235.3   
      layer skip 9   |    251.1   
      layer skip 10  |    260.7   
      layer skip 11  |    276.1   
      layer skip 12  |    291.9   
      layer skip 13  |    307.8   
      layer skip 14  |    323.1   
      layer skip 15  |    338.9   

Times are in milliseconds (ms).

With Layer number 4 we get around 67% reduction in generation latency.

We would love for the trl team to give us feedback on the technique. Any help or guidance would be appreciated.

CC: @mostafaelhoushi

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@mostafaelhoushi mostafaelhoushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aritra! I just added some minor comments.

Co-authored-by: Mostafa Elhoushi <[email protected]>
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding in the research projects!

@ariG23498
Copy link
Collaborator Author

Hi @qgallouedec are we missing out on something? I am not sure if the CI errors are due to the changes, should I look into it more?

@qgallouedec qgallouedec changed the title [Research] Layer Skip SFT 🐇 [Research] Layer Skip SFT Mar 24, 2025
@qgallouedec
Copy link
Member

Merging, sorry for the delay

@qgallouedec qgallouedec merged commit bfe2075 into huggingface:main Mar 24, 2025
8 of 13 checks passed
toslali-ibm pushed a commit to toslali-ibm/trl that referenced this pull request Mar 25, 2025
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
kashif pushed a commit to kashif/trl that referenced this pull request Mar 28, 2025
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants