-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
support bitsandbytes 8-bit and FP4 quantized models #7445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work and refactoring, appreciate it! I need to do another pass through as it's a bit dense, so if you could document more of the config arguments and _apply_8bit_weight that would be helpful
Thanks. Will do. |
with vllm_runner(model_name, | ||
quantization='bitsandbytes', | ||
load_format='bitsandbytes', | ||
enforce_eager=True) as llm: | ||
enforce_eager=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitsandbytes-foundation/bitsandbytes#1330 has been merged. Regarding these tests, can we now use cudagraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @jeejeelee, thanks to your fix in bnb.
But the latest version of bnb package was released three weeks ago, which does not include your fix yet.
I will update the code after the next bnb release is out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think it is worth doing the package upgrade in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this PR introduces a new BNB kernel . I'm not sure if the previous modifications to BNB can support this kernel. What I mean is, perhaps we should first verify this (build BNB from source). If this kernel still not supported, we may need to continue refining the relevant BNB code.
If you're not available, I can verify it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee
I tried the new bnb kernel with your fix to run the above tests under graph mode (and some other tests), it worked perfectly! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements, this looks good to me
@mgoin The test errors seem unrelated to my change. What shall I do? |
@chenqianfzh could you please merge with latest main? recent PRs don't seem to be failing, so I wouldn't expect test errors |
I'm not sure what is the issue, I was also manually retrying tests in buildkite.. I will run the tests locally |
My local tests always pass. However, I guess it is related to GPU memories not release correctly. I am trying my fix with a fake PR now. |
@mgoin I found all the checks have passed, could you help merge it? Thx |
Does this also mean we can use bitsandbytes with tensor-parallel-size > 1? |
No, not yet. I am working on TP with bnb now. It will be out in a different PR. |
Hi @chenqianfzh thanks for that! Where does that PR live, so I can keep following up on it? ^^ |
Is it here? bytedance-iaas@e8d5453 |
yep. #8434 is for the bnb TP. |
Signed-off-by: Alvant <[email protected]>
@chenqianfzh it doesn't seem the vLLM BNB documentation has been updated to reflect that 8 bit quantization is now available. The default BNB quantization following the documentation is 4 bit. |
Signed-off-by: LeiWang1999 <[email protected]>
This PR does the following:
FIX #6756