-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🐇 [Research] Layer Skip SFT #3111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Aritra! I just added some minor comments.
examples/research_projects/layer_skip/scripts/benchmark_layer_skip.py
Outdated
Show resolved
Hide resolved
examples/research_projects/layer_skip/scripts/benchmark_layer_skip.py
Outdated
Show resolved
Hide resolved
examples/research_projects/layer_skip/scripts/benchmark_layer_skip.py
Outdated
Show resolved
Hide resolved
examples/research_projects/layer_skip/scripts/benchmark_layer_skip.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Mostafa Elhoushi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding in the research projects!
Co-authored-by: Quentin Gallouédec <[email protected]>
Hi @qgallouedec are we missing out on something? I am not sure if the CI errors are due to the changes, should I look into it more? |
Merging, sorry for the delay |
Co-authored-by: Mostafa Elhoushi <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Mostafa Elhoushi <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Mostafa Elhoushi <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
We have used
unsloth/Llama-3.2-3B
and applied Layer Skip SFT on it with thehttps://huggingface.co/datasets/WillHeld/top_v2
dataset. You can find the fine tuned model here.Benchmark Results:
Running the throughput benchmark script we get the following:
With Layer number 4 we get around 67% reduction in generation latency.
We would love for the trl team to give us feedback on the technique. Any help or guidance would be appreciated.
CC: @mostafaelhoushi