-
Notifications
You must be signed in to change notification settings - Fork 818
[model] support Qwen3-235B-A22B-Instruct-250718 #5033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[model] support Qwen3-235B-A22B-Instruct-250718 #5033
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the Qwen3-235B-A22B-Instruct-250718
model by updating the model registration and documentation. The core changes are correct. However, I have a couple of suggestions for improvement:
- Code Organization: In
swift/llm/model/model/qwen.py
, the new model is added in a separateModelGroup
. It would be more consistent to add it to the existingModelGroup
forqwen3_moe
models.
Addressing these points will improve the maintainability and clarity of the codebase.
swift/llm/model/model/qwen.py
Outdated
ModelGroup([ | ||
Model('Qwen/Qwen3-235B-A22B-Instruct-250718', 'Qwen/Qwen3-235B-A22B-Instruct-250718'), | ||
]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better code organization and maintainability, it would be clearer to add the new model Qwen/Qwen3-235B-A22B-Instruct-250718
to the existing ModelGroup
for qwen3_moe
models, rather than creating a new ModelGroup
for a single model. You can add it under the # instruct
comment in the first ModelGroup
.
Model Finetuning
https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/qwen3_235b.sh
We introduce self-cognition finetuning of Qwen3-235B-A22B-Instruct-2507 using Megatron & LoRA integrated with ms-swift. You will need 8 GPUs with 80GiB memory each.
Before starting the finetuning process, please ensure your environment is properly set up.
For instructions on installing Megatron-related dependencies, please refer to the Megatron-SWIFT training documentation (Docker images are also available):
https://swift.readthedocs.io/en/latest/Instruction/Megatron-SWIFT-Training.html
The finetuning dataset should be prepared in the following format (the "system" field is optional). You can specify it in the training script using
--dataset <dataset_path>
.Training memory usage:

Training log:

If you need to run it on 8 GPUs with 80GiB memory each, you can use the following configuration: