add qwen 3 support #118

saum7800 · 2025-05-13T08:29:09Z

qwen 3 works now!

update vllm and unsloth
update patches, specifically the get lora tokenizer one and the wake_up() call
add example with qwen 3. important to use thinking true or false properly!

have not tested the moe models yet, but that can be a future pr (if they don't already start working).

worked with python 3.12. not tested with 3.10/11 yet

corbt · 2025-05-13T14:25:51Z

pyproject.toml

@@ -54,8 +54,8 @@ dev-dependencies = [
 ]
 override-dependencies = [
    "bitsandbytes; sys_platform == 'linux'",
-    "unsloth==2025.3.19 ; sys_platform == 'linux'",
-    "unsloth-zoo==2025.3.17 ; sys_platform == 'linux'",
+    "unsloth==2025.5.1 ; sys_platform == 'linux'",


Can we remove these from override-dependencies and just keep them in dependencies now?

corbt

Nice! You've confirmed training appears to converge?

bradhilton

Looks good. My main concern is that when I previously tested vLLM v0.8.3, inference, for at least LoRA or quantized models, was notably slower. Maybe they've since addressed that, but it's something I'd be worried about.

corbt · 2025-05-13T21:00:38Z

This is a fair concern—in our internal testing of vllm 0.8.4 LoRA inference was 4x slower than non-LoRA. Seems like there may have been a regression recently.

bradhilton · 2025-05-13T21:17:22Z

I wonder if it is possible to upgrade Unsloth to support Qwen3 without upgrading vLLM

saum7800 · 2025-05-13T21:41:56Z

vLLM does need to be >=0.8.5 for qwen3 I believe -- https://qwen.readthedocs.io/en/latest/deployment/vllm.html . will still test. currently running quick benchmark tests with ART to see how much slower generation is.

@corbt , yeah, checked convergence, which looks good!

saum7800 · 2025-05-13T21:54:34Z

Inference benchmark results v0.8.5:
Iterations: 10
Concurrency: 5
Total time: 52.77 s
Min iteration: 5.18 s
Max iteration: 6.06 s
Avg iteration: 5.28 s
Std dev iter: 0.28 s
Avg per req: 1.06 s/request

Inference benchmark results v0.7.3:
Iterations: 10
Concurrency: 5
Total time: 49.77 s
Min iteration: 4.93 s
Max iteration: 5.02 s
Avg iteration: 4.98 s
Std dev iter: 0.02 s
Avg per req: 1.00 s/request

each run is 10 iterations, where each iteration is 5 concurrent requests, each with 200 output tokens. you're right @bradhilton , it's definitely slower, but not unusably slower imo. in general, we will have to keep moving with vllm and unsloth versions, so I am in favour of going ahead. Thoughts, @corbt and @bradhilton ?

bradhilton · 2025-05-13T22:03:35Z

I don't know if this is the best example to benchmark because the requests are exceptionally small. When we get a chance, I'd like to see benchmarks for Temporal Clue or a similarly inference intense task.

saum7800 · 2025-05-13T22:10:35Z

what's the ballpark input token, output token, concurrency for temporal clue would you say? shouldn't be too hard to update benchmark script

bradhilton · 2025-05-13T22:18:45Z

800 concurrent requests with about 1,000 input and 1,000 output tokens, Qwen2.5 16B, on an H100.

saum7800 · 2025-05-13T23:38:19Z

we're looking good, in fact better in the high concurrency run high input output token run.

Inference benchmark results v0.8.5:
Iterations: 1
Concurrency: 800
Total time: 245.20 s
Avg iteration: 245.20 s
Avg per req: 0.31 s/request
Avg prompt tokens: 1538.00
Avg completion tokens: 813.50

Inference benchmark results v0.7.3:
Iterations: 1
Concurrency: 800
Total time: 274.60 s
Avg iteration: 274.60 s
Avg per req: 0.34 s/request
Avg prompt tokens: 1538.00
Avg completion tokens: 794.20

seems to be on par. we can merge right now... can always revert if we see massive performance losses. will wait for your opinion before merging @corbt @bradhilton

bradhilton · 2025-05-13T23:44:52Z

Okay, awesome. LGTM.

add qwen 3 support

9a03187

saum7800 requested review from bradhilton and corbt May 13, 2025 08:29

corbt reviewed May 13, 2025

View reviewed changes

corbt approved these changes May 13, 2025

View reviewed changes

bradhilton reviewed May 13, 2025

View reviewed changes

add benchmark inference

4767234

remove override deps for unsloth

da5dbfc

updated benchmark

98863fd

bradhilton self-requested a review May 13, 2025 23:44

bradhilton approved these changes May 13, 2025

View reviewed changes

saum7800 merged commit 76199b6 into main May 13, 2025
1 check passed

saum7800 deleted the q3_check branch May 13, 2025 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add qwen 3 support #118

add qwen 3 support #118

Uh oh!

saum7800 commented May 13, 2025 •

edited

Loading

Uh oh!

corbt May 13, 2025

Uh oh!

corbt left a comment

Uh oh!

bradhilton left a comment

Uh oh!

corbt commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

saum7800 commented May 13, 2025 •

edited

Loading

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025 •

edited

Loading

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

add qwen 3 support #118

add qwen 3 support #118

Uh oh!

Conversation

saum7800 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corbt May 13, 2025

Choose a reason for hiding this comment

Uh oh!

corbt left a comment

Choose a reason for hiding this comment

Uh oh!

bradhilton left a comment

Choose a reason for hiding this comment

Uh oh!

corbt commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

saum7800 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

saum7800 commented May 13, 2025

Uh oh!

bradhilton commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

saum7800 commented May 13, 2025 •

edited

Loading

saum7800 commented May 13, 2025 •

edited

Loading

bradhilton commented May 13, 2025 •

edited

Loading