support bitsandbytes 8-bit and FP4 quantized models #7445

chenqianfzh · 2024-08-12T23:23:37Z

This PR does the following:

support quantized bitsandbytes 8-bit models, such as meta-llama/Llama-Guard-3-8B-INT8
support quantized bitsandbytes 4-bit FP4 models, such as PrunaAI/Einstein-v6.1-Llama3-8B-bnb-4bit-smashed
Add comments about enforcing eager-mode in bnb quantization, as I identified it is a bug in the underlying dependency package of bitsandbytes.

github-actions · 2024-08-12T23:23:47Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

mgoin

This is great work and refactoring, appreciate it! I need to do another pass through as it's a bit dense, so if you could document more of the config arguments and _apply_8bit_weight that would be helpful

tests/quantization/test_bitsandbytes.py

chenqianfzh · 2024-08-13T18:35:30Z

This is great work and refactoring, appreciate it! I need to do another pass through as it's a bit dense, so if you could document more of the config arguments and _apply_8bit_weight that would be helpful

Thanks. Will do.

jeejeelee · 2024-08-23T07:09:01Z

tests/quantization/test_bitsandbytes.py

    with vllm_runner(model_name,
                     quantization='bitsandbytes',
                     load_format='bitsandbytes',
-                     enforce_eager=True) as llm:
+                     enforce_eager=True,


bitsandbytes-foundation/bitsandbytes#1330 has been merged. Regarding these tests, can we now use cudagraph?

Hi, @jeejeelee, thanks to your fix in bnb.

But the latest version of bnb package was released three weeks ago, which does not include your fix yet.

I will update the code after the next bnb release is out.

Agreed, I think it is worth doing the package upgrade in another PR

It seems that this PR introduces a new BNB kernel . I'm not sure if the previous modifications to BNB can support this kernel. What I mean is, perhaps we should first verify this (build BNB from source). If this kernel still not supported, we may need to continue refining the relevant BNB code.

If you're not available, I can verify it next week.

@jeejeelee
I tried the new bnb kernel with your fix to run the above tests under graph mode (and some other tests), it worked perfectly! :-)

mgoin

Thanks for the improvements, this looks good to me

chenqianfzh · 2024-08-26T07:07:11Z

@mgoin The test errors seem unrelated to my change. What shall I do?

mgoin · 2024-08-26T15:06:33Z

@chenqianfzh could you please merge with latest main? recent PRs don't seem to be failing, so I wouldn't expect test errors

mgoin · 2024-08-27T00:29:10Z

I'm not sure what is the issue, I was also manually retrying tests in buildkite.. I will run the tests locally

chenqianfzh · 2024-08-27T01:00:39Z

I'm not sure what is the issue, I was also manually retrying tests in buildkite.. I will run the tests locally

My local tests always pass.

However, I guess it is related to GPU memories not release correctly. I am trying my fix with a fake PR now.

chenqianfzh · 2024-08-29T22:38:48Z

@mgoin I found all the checks have passed, could you help merge it? Thx

jvlinsta · 2024-09-05T15:09:29Z

Does this also mean we can use bitsandbytes with tensor-parallel-size > 1?

chenqianfzh · 2024-09-05T15:35:07Z

Does this also mean we can use bitsandbytes with tensor-parallel-size > 1?

No, not yet.

I am working on TP with bnb now. It will be out in a different PR.

jvlinsta · 2024-09-13T13:29:24Z

Hi @chenqianfzh thanks for that! Where does that PR live, so I can keep following up on it? ^^

jvlinsta · 2024-09-13T13:30:59Z

Is it here? bytedance-iaas@e8d5453

chenqianfzh · 2024-09-13T20:03:44Z

Is it here? bd-iaas-us@e8d5453

yep. #8434 is for the bnb TP.

Signed-off-by: Alvant <[email protected]>

molereddy · 2024-11-27T14:46:44Z

@chenqianfzh it doesn't seem the vLLM BNB documentation has been updated to reflect that 8 bit quantization is now available. The default BNB quantization following the documentation is 4 bit.
The usage of 4 bit vs 8 bit is unclear to me in this PR. Can you clarify how to use 8 bit BNB quantization?

Signed-off-by: LeiWang1999 <[email protected]>

chenqianfzh force-pushed the bnb-8bit branch from fcd6fe0 to 685113e Compare August 12, 2024 23:26

support bitsandbytes 8-bit and FP4 quantized models

450ae2a

chenqianfzh force-pushed the bnb-8bit branch from 685113e to 450ae2a Compare August 12, 2024 23:31

chenqianfzh mentioned this pull request Aug 12, 2024

[Bug]: Unable to run meta-llama/Llama-Guard-3-8B-INT8 #6756

Closed

mgoin reviewed Aug 13, 2024

View reviewed changes

tests/quantization/test_bitsandbytes.py Outdated Show resolved Hide resolved

thesues mentioned this pull request Aug 14, 2024

Add bitsandbytes fp4 support #7320

Closed

This was referenced Aug 19, 2024

[Feature]: support fp4 bnb optimization bytedance-iaas/vllm#26

Closed

[Feature]: support 8-bit bnb quantization bytedance-iaas/vllm#25

Closed

chenqianfzh added 2 commits August 22, 2024 23:06

fix tests etc

a2da808

Merge branch 'main' into bnb-8bit

141d437

jeejeelee reviewed Aug 23, 2024

View reviewed changes

mgoin approved these changes Aug 23, 2024

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2024

chenqianfzh added 2 commits August 24, 2024 00:22

Merge branch 'main' into bnb-8bit

26180ef

Merge branch 'main' into bnb-8bit

cad394d

chenqianfzh mentioned this pull request Aug 26, 2024

[Feature][Kernel] Support bitsandbytes quantization and QLoRA #4776

Merged

chenqianfzh added 2 commits August 26, 2024 17:06

Merge branch 'main' into bnb-8bit

11c07d4

WIP

c6fb4e0

chenqianfzh mentioned this pull request Aug 27, 2024

[distributed][kernel]support tensor-parallelism in bitsandbytes quant… #5813

Closed

simplify the test to cut memory print

368ce90

mgoin merged commit 4664cea into vllm-project:main Aug 29, 2024
45 checks passed

chenqianfzh deleted the bnb-8bit branch August 30, 2024 00:52

mgoin mentioned this pull request Sep 25, 2024

[Bug]: Loading a model with bitsandbytes 8bit quantization #8799

Closed

1 task

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

support bitsandbytes 8-bit and FP4 quantized models (vllm-project#7445)

6afe796

Signed-off-by: Alvant <[email protected]>

molereddy mentioned this pull request Nov 27, 2024

[Doc]: BNB 8 bit quantization is undocumented #10723

Closed

1 task

noooop mentioned this pull request Jan 6, 2025

[Feature]: Support Inflight quantization: load as 8bit quantization. #11655

Open

1 task

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

support bitsandbytes 8-bit and FP4 quantized models (vllm-project#7445)

66c03ec

Signed-off-by: LeiWang1999 <[email protected]>

Uh oh!

support bitsandbytes 8-bit and FP4 quantized models #7445

support bitsandbytes 8-bit and FP4 quantized models #7445

Conversation

chenqianfzh commented Aug 12, 2024 • edited by mgoin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 12, 2024

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenqianfzh commented Aug 13, 2024

Uh oh!

jeejeelee Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

chenqianfzh Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

mgoin Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

jeejeelee Aug 23, 2024

Choose a reason for hiding this comment

Uh oh!

chenqianfzh Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

chenqianfzh commented Aug 26, 2024

Uh oh!

mgoin commented Aug 26, 2024

Uh oh!

mgoin commented Aug 27, 2024

Uh oh!

chenqianfzh commented Aug 27, 2024

Uh oh!

chenqianfzh commented Aug 29, 2024

Uh oh!

Uh oh!

jvlinsta commented Sep 5, 2024

Uh oh!

chenqianfzh commented Sep 5, 2024

Uh oh!

jvlinsta commented Sep 13, 2024

Uh oh!

jvlinsta commented Sep 13, 2024

Uh oh!

chenqianfzh commented Sep 13, 2024

Uh oh!

molereddy commented Nov 27, 2024

Uh oh!

Uh oh!

chenqianfzh commented Aug 12, 2024 •

edited by mgoin

Loading

chenqianfzh Aug 24, 2024 •

edited

Loading