-
Notifications
You must be signed in to change notification settings - Fork 482
Add bitsandbytes
optimizer support
#133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for whenever im back on my computer, commit created from my phone
Since this tunes the LayerNorm params as well it is not faithful to the original paper (https://arxiv.org/pdf/2106.10199.pdf) and shouldn't carry the same name as it's different.
Will add an option somewhere
Hey @Crumb! Just getting back to this. I think it's best to separate the 3 new features into their own PRs to avoid unwieldy commit histories (if there's an issue in one feature we'd have to revert the other features as well). Let's make this a PR focused on adding the Hope that's alright with you 🙏 (Also, no worries about the many commits! We squash them before merging) |
lets make the bitsandbytes pr |
bitsandbytes
optimizer support
Details on latest changes
Reports
|
Looks good to me, let's merge. |
hey @jon-tow, your wandb report (https://wandb.ai/carperai/trlx/reports/trlx-Add-bitsandbytes-optimizer-support-133--VmlldzozMjU5OTQx) doesn't render for me for some reason (there are no data in charts as if the run set is empty), is it again some wandb's shenanigans or just me 😴 |
It was rendering yesterday. I think wandb is just being weird. |
@reciprocated those were |
@jon-tow that link gives 404 for me 😓 you have to share a "magic" link instead |
@jon-tow Thanks 🙏 |
Uses bitsandbytes adam optimizer instead of torch, adds very simple gradient accumulation, finetuning only bias/layernorms (tested, works very well and is very fast) and allows for different precisions easier. (sorry for the absolute mass of commits, I was making them on github.com)