Add `bitsandbytes` optimizer support #133

aicrumb · 2022-12-12T20:47:20Z

Uses bitsandbytes adam optimizer instead of torch, adds very simple gradient accumulation, finetuning only bias/layernorms (tested, works very well and is very fast) and allows for different precisions easier. (sorry for the absolute mass of commits, I was making them on github.com)

for whenever im back on my computer, commit created from my phone

Since this tunes the LayerNorm params as well it is not faithful to the original paper (https://arxiv.org/pdf/2106.10199.pdf) and shouldn't carry the same name as it's different.

Will add an option somewhere

jon-tow · 2022-12-16T01:18:16Z

Hey @Crumb! Just getting back to this. I think it's best to separate the 3 new features into their own PRs to avoid unwieldy commit histories (if there's an issue in one feature we'd have to revert the other features as well).

Let's make this a PR focused on adding the bitsandbytes optimizer support. I believe there might some more thought that needs to go into adding grad accumulation; it's possible accelerate handles this in subtly different ways for deepspeed integration (which we rely heavily on) and it should be properly tested.

Hope that's alright with you 🙏

(Also, no worries about the many commits! We squash them before merging)

LouisCastricato · 2023-01-02T18:41:30Z

lets make the bitsandbytes pr

jon-tow · 2023-01-04T02:00:25Z

Details on latest changes

We opt to not convert nn.Embedding layers into bnb.nn.StableEmbedding layers since trlx is essentially a fine-tuning library; according to Tim Dettmers:

"StableEmbedding layer is only required if the model was pretrained with the StableEmbedding layer."

See relevant discussion here.
- Instead, we take the more general approach of forcing nn.Embedding layers into 32-bit precision following advice from Tim Dettmers:
  
  "for pretrained models, the best would be to use the 32-bit optimized embedding layer (bnb.nn.Embedding) and no layer norm if the pretrained model was not trained with a [StableEmbedding] layer norm."
  
  Again, relevant discussion here.

Reports

In practice, we can see ~13% memory savings for GPT-J when trained with 8 unfrozen layers. Note that large memory savings occur only when a large portion of the model is unfrozen (num_layers_unfrozen is approx. some large fraction of total model layers).
- There seems to be a slight penalty for using 8-bit optimizers but nothing significantly divergent - can possibly be overcome with improved hyper-parameter tuning.
- Relevant wandb report: here

CC @LouisCastricato

LouisCastricato · 2023-01-04T02:30:12Z

Looks good to me, let's merge.

maxreciprocate · 2023-01-04T11:43:26Z

hey @jon-tow, your wandb report (https://wandb.ai/carperai/trlx/reports/trlx-Add-bitsandbytes-optimizer-support-133--VmlldzozMjU5OTQx) doesn't render for me for some reason (there are no data in charts as if the run set is empty), is it again some wandb's shenanigans or just me 😴

LouisCastricato · 2023-01-04T13:00:20Z

It was rendering yesterday. I think wandb is just being weird.

jon-tow · 2023-01-04T16:48:56Z

@reciprocated those were jon-tow shenanigans 😅 Yesterday I discovered deleting a run also deletes its data from reports (which makes sense in hindsight). I re-ran them from my personal wandb account - try this link: https://wandb.ai/jon-tow/trlx/reports/trlx-Add-bitsandbytes-optimizer-support-133--VmlldzozMjY1MzI1

maxreciprocate · 2023-01-04T22:46:21Z

@jon-tow that link gives 404 for me 😓 you have to share a "magic" link instead

jon-tow · 2023-01-04T22:48:17Z

@reciprocated This oughta do it: https://wandb.ai/jon-tow/trlx/reports/trlx-Add-bitsandbytes-optimizer-support-133--VmlldzozMjY1MzI1?accessToken=4t40dgib8rzs1qb6kqgsx8yuqhp5csfc2dsmxpz0qdwv2justj4yer44qwwtg7qp

maxreciprocate · 2023-01-05T19:14:18Z

@jon-tow Thanks 🙏

aicrumb added 30 commits December 10, 2022 02:33

Add bitsandbytes AdamW8bit

1e027c1

Add accumulate_steps to the training loop

4ace8ab

Update accelerate_base_model.py

f94dbe1

Update configs.py

d04109f

Update ilql_config.yml

3cd872b

Update ppo_config.yml

14be1a6

Update ppo_gptj.yml

ba9a1f0

Update test_config.yml

7f4b18a

Update accelerate_base_model.py

de3041c

bitfit + layernorm by default

6e745b2

Update accelerate_base_model.py

3611fee

Update accelerate_base_model.py

3ae18d7

Update accelerate_base_model.py

f24eff6

Update trlx.py

57f134f

Update trlx.py

ba5d737

Update trlx.py

9fad147

Update accelerate_base_model.py

87e3119

janky disk offload feature

1a66600

Update ilql_models.py

c88f0bd

Update ilql_models.py

bd56ba6

Auto device-map causing errors, temporary removal

02a4cb3

add bitfit example

04f3bfa

Update accelerate_base_model.py

4b7ffb7

todo

5d540ad

add todo message

22c39cf

for whenever im back on my computer, commit created from my phone

add bitsandbytes to the setup requirements

9ad73b1

Remove misnomer (bitfit) + Simplify example

f45148a

Since this tunes the LayerNorm params as well it is not faithful to the original paper (https://arxiv.org/pdf/2106.10199.pdf) and shouldn't carry the same name as it's different.

Remove misnomer (bitfit) from training loop

4598b28

The precision should be user's choice

89f87ac

Will add an option somewhere

Adding dtype to configs

41cec16

aicrumb added 5 commits December 13, 2022 01:34

remove unnecessary dtype arg - test_config

61e059f

remove unnecessary dtype arg - ppo_gptj

47fd971

remove unnecessary dtype arg - ilql_config

be2bae7

remove unnecessary dtype arg - accelerate_ilql_model

e6c0543

remove depreciated dype arg from ppo_models

09c4d13

jon-tow mentioned this pull request Dec 14, 2022

Add OptimizerConfig and SchedulerConfig #135

Merged

jon-tow added 2 commits December 15, 2022 03:47

Merge branch 'main' of https://github.com/CarperAI/trlx into main

e8fae10

Run pre-commit checks and formatting

02599fe

jon-tow and others added 4 commits December 16, 2022 01:20

Revert non-bnb optimizer features

f59584d

Add 8-bit bitsandbytes optimizers

fb7673a

Run pre-commit checks

87189bc

Merge branch 'main' of https://github.com/CarperAI/trlx into main

54c1e17

jon-tow added this to the v0.4.0 milestone Jan 2, 2023

jon-tow changed the title ~~bnb adam, grad accumulation, dtype param~~ Add bitsandbytes optimizer support Jan 2, 2023

Merge branch 'main' of https://github.com/CarperAI/trlx into main

99cfb97

LouisCastricato approved these changes Jan 4, 2023

View reviewed changes

guac and others added 2 commits January 4, 2023 02:48

Make bitsandbytes an extra dependency

c3ed7ce

Add bitsandbytes to build testing

b50b0d8

jon-tow approved these changes Jan 4, 2023

View reviewed changes

jon-tow merged commit 17df88e into CarperAI:main Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `bitsandbytes` optimizer support #133

Add `bitsandbytes` optimizer support #133

Uh oh!

aicrumb commented Dec 12, 2022 •

edited

Loading

Uh oh!

jon-tow commented Dec 16, 2022

Uh oh!

LouisCastricato commented Jan 2, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Uh oh!

LouisCastricato commented Jan 4, 2023 •

edited

Loading

Uh oh!

maxreciprocate commented Jan 4, 2023

Uh oh!

LouisCastricato commented Jan 4, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Uh oh!

maxreciprocate commented Jan 4, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Uh oh!

maxreciprocate commented Jan 5, 2023

Uh oh!

Uh oh!

Add bitsandbytes optimizer support #133

Add bitsandbytes optimizer support #133

Uh oh!

Conversation

aicrumb commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jon-tow commented Dec 16, 2022

Uh oh!

LouisCastricato commented Jan 2, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Details on latest changes

Reports

Uh oh!

LouisCastricato commented Jan 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxreciprocate commented Jan 4, 2023

Uh oh!

LouisCastricato commented Jan 4, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Uh oh!

maxreciprocate commented Jan 4, 2023

Uh oh!

jon-tow commented Jan 4, 2023

Uh oh!

maxreciprocate commented Jan 5, 2023

Uh oh!

Uh oh!

Add `bitsandbytes` optimizer support #133

Add `bitsandbytes` optimizer support #133

aicrumb commented Dec 12, 2022 •

edited

Loading

LouisCastricato commented Jan 4, 2023 •

edited

Loading