-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
fe4714f
to
84f660e
Compare
CC: @mgoin @tlrmchlsmth |
8687635
to
b9f210f
Compare
5012240
to
992cac7
Compare
992cac7
to
f690372
Compare
Hi @mgoin, can you please review my PR. |
Hi @mgoin , @tlrmchlsmth could you please support to review this PR. |
ccc929f
to
0af4130
Compare
CC: @mgoin |
941e161
to
00306fb
Compare
@mgoin Kindly support to review the PR. |
Signed-off-by: nishith-fujitsu <[email protected]>
00306fb
to
537dd46
Compare
Signed-off-by: nishith-fujitsu <[email protected]>
ddac255
to
e2802af
Compare
@mgoin , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping and apologies for the delay. I'll take your word for it that you've tested the models working. In the future, it would be great if we could setup CI or publish results in the PR for what has been tested.
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]> Signed-off-by: Patrick von Platen <[email protected]>
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]> Signed-off-by: avigny <[email protected]>
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>
…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>
Description
This PR enables support of vLLM INT8 quantized model for AARCH64 architecture. Enabled ARM path for CPU inference of INT8 quantized models.
ARM Compatibility:
Modified the build scripts, and configuration files to ensure compatibility with ARM processors.
Checklist
Code changes have been tested on ARM devices (Graviton3).
Modifications
Note: ACL kernels will not run for int8 because of per channel quantization strategy by default case in vLLM and ACL doesn't support per channel quantization.