[GGUF] support Qwen3 architecture #2273

TianmengChen · 2025-05-27T02:13:57Z

Detail:

Load attn_q_norm attn_k_norm weight for Qwen3 architecture.
Add rms_norm in splite_head after q k head for Qwen3 architecture.

Validation Scope:
Qwen3-0.6B-f16 (CPU/GPU)
Qwen3-0.6B-Q8_0 (CPU*)
Qwen3-4B-Q4_K_M (CPU/GPU)

*Q8_0 has accuracy issue on GPU: CVS-166108

Copilot

Pull Request Overview

This PR adds support for the Qwen3 architecture by updating model configuration, weight loading, and head-splitting logic. Key changes include:

Adding Qwen3 to the list of supported architectures in gguf_modeling.cpp.
Updating head_size configuration to use the key_length metadata if available and loading attn_q_norm/attn_k_norm weights in gguf.cpp.
Introducing a new split_heads helper function in building_blocks.cpp that applies RMS normalization for Qwen3 and updating multi_head_attention to use it.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/cpp/src/gguf_utils/gguf_modeling.cpp	Expanded architecture support by including Qwen3 in model creation logic.
src/cpp/src/gguf_utils/gguf.cpp	Adjusted head_size calculation and added loading for Qwen3-specific normalization weights.
src/cpp/src/gguf_utils/building_blocks.cpp	Added a new split_heads helper function with RMS norm handling and updated multi_head_attention to use it.

Comments suppressed due to low confidence (1)

src/cpp/src/gguf_utils/building_blocks.cpp:357

The key used for v-split normalization ("self_attn.v_norm") does not appear to be loaded anywhere, unlike the Qwen3 q_norm and k_norm weights. Please confirm if the absence of a v_norm weight is intentional or if the weight loading logic should be updated accordingly.

auto v_split = split_heads(value, num_heads_kv, head_dim, rms_norm_eps, key_name + ".self_attn.v_norm", consts);

src/cpp/src/gguf_utils/building_blocks.cpp

sammysun0711 · 2025-05-27T09:20:17Z

build_jenkins

sammysun0711 · 2025-05-30T00:47:50Z

Please add a test case for Qwen3: e.g. Qwen/Qwen3-0.6B-GGUF

Add gguf_model_id and gguf_filename in get_gguf_model_list:

openvino.genai/tests/python_tests/data/models.py

Lines 79 to 82 in 8988da2

    
           { 
        
               "gguf_model_id": "Qwen/Qwen2.5-0.5B-Instruct-GGUF", 
        
               "gguf_filename": "qwen2.5-0.5b-instruct-q4_0.gguf" 
        
           },

Make sure test passed:

openvino.genai/tests/python_tests/test_llm_pipeline.py

Line 850 in 8988da2

def test_full_gguf_pipeline(pipeline_type, model_ids):

sammysun0711 · 2025-05-30T03:45:13Z

It seems that Qwen3 GGUF was not supported by transformer yet: ggml.py
Hence the test failed here: https://github.com/openvinotoolkit/openvino.genai/actions/runs/15337247120/job/43157146883?pr=2273
GGUF model with architecture qwen3 is not supported yet.
@rkazants, can we skip the test first after qwen3 GGUF support by transformers?

src/cpp/src/gguf_utils/building_blocks.cpp

rkazants · 2025-05-30T04:06:48Z

It seems that Qwen3 GGUF was not supported by transformer yet: ggml.py Hence the test failed here: https://github.com/openvinotoolkit/openvino.genai/actions/runs/15337247120/job/43157146883?pr=2273 GGUF model with architecture qwen3 is not supported yet. @rkazants, can we skip the test first after qwen3 GGUF support by transformers?

please add a separate test without comparation with HF results. Just let us compare with some hard-coded expected result for input prompt. And add a comment that this is temporal testing solution until transformer starts to support qwen3 in GGUF format.

samples/cpp/text_generation/greedy_causal_lm.cpp

Revert to default value

src/cpp/src/gguf_utils/gguf_tokenizer.cpp

sammysun0711 · 2025-06-04T02:59:09Z

build_jenkins

tests/python_tests/test_llm_pipeline.py

Fix qwen3 test case

tests/python_tests/test_llm_pipeline.py

sammysun0711 · 2025-06-04T11:32:03Z

build_jenkins

tests/python_tests/test_llm_pipeline.py

sammysun0711 · 2025-06-10T03:58:06Z

It seems that Qwen3 GGUF was not supported by transformer yet: ggml.py Hence the test failed here: https://github.com/openvinotoolkit/openvino.genai/actions/runs/15337247120/job/43157146883?pr=2273 GGUF model with architecture qwen3 is not supported yet. @rkazants, can we skip the test first after qwen3 GGUF support by transformers?

please add a separate test without comparation with HF results. Just let us compare with some hard-coded expected result for input prompt. And add a comment that this is temporal testing solution until transformer starts to support qwen3 in GGUF format.

Qwen3 GGUF related test added:

openvino.genai/tests/python_tests/test_llm_pipeline.py

Line 887 in 82b53d4

def test_full_gguf_qwen3_pipeline(pipeline_type, model_ids):

CI test run & passed: https://github.com/openvinotoolkit/openvino.genai/actions/runs/15546210687/job/43773618720?pr=2273#step:8:151

sammysun0711 · 2025-06-10T04:40:50Z

build_jenkins

**Detail:** 1. Load attn_q_norm attn_k_norm weight for Qwen3 architecture. 2. Add rms_norm in splite_head after q k head for Qwen3 architecture. **Validation Scope:** [Qwen3-0.6B-f16](https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/blob/main/Qwen3-0.6B-f16.gguf) (CPU/GPU) [Qwen3-0.6B-Q8_0](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/blob/main/Qwen3-0.6B-Q8_0.gguf) (CPU*) [Qwen3-4B-Q4_K_M](https://huggingface.co/Qwen/Qwen3-4B-GGUF/blob/main/Qwen3-4B-Q4_K_M.gguf) (CPU/GPU) *Q8_0 has accuracy issue on GPU: CVS-166108 --------- Co-authored-by: Xiake Sun <[email protected]>

TianmengChen added 2 commits May 23, 2025 14:48

add codes for qwen3

a689631

support qwen3, tested on qwen3-0.6b-q80/f16

9c3cac3

github-actions bot added the category: GGUF GGUF file reader label May 27, 2025

TianmengChen added 2 commits May 27, 2025 10:37

add more comments

9f1071a

correct comments

ea38c70

sammysun0711 requested a review from Copilot May 27, 2025 03:39

Copilot AI reviewed May 27, 2025

View reviewed changes

sammysun0711 reviewed May 27, 2025

View reviewed changes

src/cpp/src/gguf_utils/building_blocks.cpp Outdated Show resolved Hide resolved

rkazants added the pr_needs_tests label May 27, 2025

TianmengChen and others added 2 commits May 27, 2025 16:13

create a new function called make_rms_norm_qwen3

770ceed

Merge branch 'master' into gguf-qwen3

57c79f0

sammysun0711 requested review from Wovchena, rkazants and as-suvorov May 27, 2025 09:44

Merge branch 'master' into gguf-qwen3

9b880e9

Merge branch 'master' into gguf-qwen3

594ae51

github-actions bot added the no-match-files label May 30, 2025

add test model for python_tests

a153941

rkazants reviewed May 30, 2025

View reviewed changes

github-actions bot added category: LLM LLM pipeline (stateful, static) category: LLM samples GenAI LLM samples and removed no-match-files labels May 30, 2025

TianmengChen added 2 commits May 30, 2025 15:39

add new test func for qwen3, add tokenizer template patch code for qwen3

31331ba

commit suggestion from rkazants

fb38393

as-suvorov assigned rkazants May 30, 2025

sammysun0711 reviewed May 30, 2025

View reviewed changes

samples/cpp/text_generation/greedy_causal_lm.cpp Outdated Show resolved Hide resolved

Update samples/cpp/text_generation/greedy_causal_lm.cpp

4630336

Revert to default value

github-actions bot removed the category: LLM samples GenAI LLM samples label May 30, 2025

sammysun0711 reviewed May 30, 2025

View reviewed changes

src/cpp/src/gguf_utils/gguf_tokenizer.cpp Show resolved Hide resolved

Merge branch 'master' into gguf-qwen3

6982032

sammysun0711 requested a review from rkazants June 4, 2025 04:51

sammysun0711 reviewed Jun 4, 2025

View reviewed changes

tests/python_tests/test_llm_pipeline.py Outdated Show resolved Hide resolved

Update tests/python_tests/test_llm_pipeline.py

56e2a80

Fix qwen3 test case

sammysun0711 reviewed Jun 4, 2025

View reviewed changes

tests/python_tests/test_llm_pipeline.py Outdated Show resolved Hide resolved

Update tests/python_tests/test_llm_pipeline.py

71c8ed3

sammysun0711 reviewed Jun 4, 2025

View reviewed changes

tests/python_tests/test_llm_pipeline.py Show resolved Hide resolved

Merge branch 'master' into gguf-qwen3

82b53d4

sammysun0711 removed the pr_needs_tests label Jun 10, 2025

rkazants approved these changes Jun 10, 2025

View reviewed changes

rkazants added this to the 2025.3 milestone Jun 10, 2025

Wovchena approved these changes Jun 10, 2025

View reviewed changes

Wovchena added this pull request to the merge queue Jun 10, 2025

Merged via the queue into openvinotoolkit:master with commit 24493e9 Jun 10, 2025
165 of 176 checks passed

sammysun0711 mentioned this pull request Jun 20, 2025

Port [GGUF] support Qwen3 architecture (#2273) to 25.2 #2365

Closed

[GGUF] support Qwen3 architecture #2273

[GGUF] support Qwen3 architecture #2273

Uh oh!

Conversation

TianmengChen commented May 27, 2025 • edited by sammysun0711 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

sammysun0711 commented May 27, 2025

Uh oh!

sammysun0711 commented May 30, 2025

Uh oh!

sammysun0711 commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rkazants commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sammysun0711 commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

sammysun0711 commented Jun 4, 2025

Uh oh!

Uh oh!

sammysun0711 commented Jun 10, 2025

Uh oh!

sammysun0711 commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

TianmengChen commented May 27, 2025 •

edited by sammysun0711

Loading

rkazants commented May 30, 2025 •

edited

Loading