[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129

nishith-fujitsu · 2025-03-03T09:41:31Z

Description
This PR enables support of vLLM INT8 quantized model for AARCH64 architecture. Enabled ARM path for CPU inference of INT8 quantized models.

ARM Compatibility:
Modified the build scripts, and configuration files to ensure compatibility with ARM processors.

Checklist

Code changes have been tested on ARM devices (Graviton3).

Modifications

Modifications have been made to dnnl_helper file, the memory tag check has been added for AArch64 CPUs to get optimal performing kernel.
Flag VLLM build with ACL is added, by default it is set to off. This flag is useful to build oneDNN kernels with ACL, which can be utilized by CPU quantization kernels. The ACL library has to be built, and path need to be set ENV variable ACL_ROOT_DIR.
Added NEON intrinsics for structs required by Int8 quantized kernel for enabling vLLM on ARM in cpu_types_arm.hpp.
Added flags which are required to enable Int8 kernels in quant.hpp, torch_binding.cpp.
Updated oneDNN version to 3.8.1 as the implementation for int8 matmul kernel for AARCH64/ARM machine is added to oneDNN 3.8.1 and later version.

Note: ACL kernels will not run for int8 because of per channel quantization strategy by default case in vLLM and ACL doesn't support per channel quantization.

github-actions · 2025-03-03T09:41:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

akote123 · 2025-03-04T05:47:49Z

CC: @mgoin @tlrmchlsmth

nishith-fujitsu · 2025-03-19T06:47:06Z

Hi @mgoin, can you please review my PR.
Thank you

abhijain1204fujitsu · 2025-04-02T04:37:22Z

Hi @mgoin , @tlrmchlsmth could you please support to review this PR.

akote123 · 2025-06-10T12:46:36Z

CC: @mgoin

abhijain1204fujitsu · 2025-06-13T04:46:57Z

@mgoin Kindly support to review the PR.

Signed-off-by: nishith-fujitsu <[email protected]>

akote123 · 2025-07-09T10:26:07Z

@mgoin ,
Could you please support to review the PR

mgoin

Thanks for the ping and apologies for the delay. I'll take your word for it that you've tested the models working. In the future, it would be great if we could setup CI or publish results in the PR for what has been tested.

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]> Signed-off-by: Patrick von Platen <[email protected]>

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]> Signed-off-by: avigny <[email protected]>

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

mergify bot added the ci/build label Mar 3, 2025

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch 2 times, most recently from fe4714f to 84f660e Compare March 4, 2025 05:44

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch 3 times, most recently from 8687635 to b9f210f Compare March 4, 2025 06:56

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from 5012240 to 992cac7 Compare March 11, 2025 07:59

nishith-fujitsu closed this Mar 19, 2025

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from 992cac7 to f690372 Compare March 19, 2025 06:00

nishith-fujitsu reopened this Mar 19, 2025

nishith-fujitsu changed the title ~~[Feature] Vllm int8 quantization enablement for ARM CPUs~~ [Hardware][CPU] Vllm int8 quantization enablement for ARM CPUs Mar 20, 2025

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from ccc929f to 0af4130 Compare April 9, 2025 08:28

nishith-fujitsu requested review from mgoin, youkaichao, russellb, njhill, LiuXiaoxuanPKU, robertgshaw2-redhat, comaniac, KuntaiDu, DarkLight1337, ywang96, alexm-redhat, tlrmchlsmth, WoosukKwon and simon-mo as code owners April 9, 2025 08:28

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from 941e161 to 00306fb Compare June 11, 2025 06:59

Enable INT8 Execution path for ARM CPUs

537dd46

Signed-off-by: nishith-fujitsu <[email protected]>

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from 00306fb to 537dd46 Compare July 8, 2025 05:54

Update PR to support Int8 Kernels with latest oneDNN

e2802af

Signed-off-by: nishith-fujitsu <[email protected]>

nishith-fujitsu force-pushed the vllm_int8_AARCH64_Enablement branch from ddac255 to e2802af Compare July 8, 2025 06:29

mgoin approved these changes Jul 9, 2025

View reviewed changes

mgoin enabled auto-merge (squash) July 9, 2025 22:13

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 9, 2025

mgoin added quantization and removed documentation Improvements or additions to documentation frontend speculative-decoding ready ONLY add when PR is ready to merge/full CI is needed ci/build v1 multi-modality Related to multi-modality (#4194) labels Jul 9, 2025

mergify bot added the ci/build label Jul 9, 2025

mgoin added the cpu Related to CPU backends label Jul 9, 2025

mgoin merged commit c7753a9 into vllm-project:main Jul 10, 2025
107 checks passed

Chen-zexi pushed a commit to Chen-zexi/vllm that referenced this pull request Jul 13, 2025

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (vllm-p…

3c5a3b1

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

This was referenced Jul 18, 2025

[CI/Build] fix cpu_extension for apple silicon ignaciosica/vllm#2

Closed

[CI/Build] fix cpu_extension for apple silicon #21195

Merged

LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (vllm-p…

ff89804

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (vllm-p…

1305ae3

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]> Signed-off-by: avigny <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (vllm-p…

afbbeaa

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU (vllm-p…

69fa70e

…roject#14129) Signed-off-by: nishith-fujitsu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129

nishith-fujitsu commented Mar 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

akote123 commented Mar 4, 2025

Uh oh!

nishith-fujitsu commented Mar 19, 2025

Uh oh!

abhijain1204fujitsu commented Apr 2, 2025

Uh oh!

akote123 commented Jun 10, 2025

Uh oh!

abhijain1204fujitsu commented Jun 13, 2025

Uh oh!

akote123 commented Jul 9, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129

[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU #14129

Conversation

nishith-fujitsu commented Mar 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

akote123 commented Mar 4, 2025

Uh oh!

nishith-fujitsu commented Mar 19, 2025

Uh oh!

abhijain1204fujitsu commented Apr 2, 2025

Uh oh!

akote123 commented Jun 10, 2025

Uh oh!

abhijain1204fujitsu commented Jun 13, 2025

Uh oh!

akote123 commented Jul 9, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nishith-fujitsu commented Mar 3, 2025 •

edited by github-actions bot

Loading