Releases · modelscope/ms-swift

07 Aug 07:05

Jintao-Huang

v3.7.0

eefc843

v3.7.0 Latest

Latest

中文版

新特性

GRPO：
a. 支持GSPO算法，在GRPO训练中使用参数--importance_sampling_level sequence，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO server mode 支持多机 rollout，支持传入多个 vllm_server_host/port，参考脚本：https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout 兼容 GYM 环境规范（感谢开发者Mouse的贡献），参考文档 https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/GYM%E7%8E%AF%E5%A2%83%E8%AE%AD%E7%BB%83.html
d. GRPO 支持 entropy_mask 来过滤低熵token损失计算，同时logger支持记录熵值动态，参考文档https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. 支持多轮算法DeepEyes训练，文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO 支持--truncation_strategy delete，删除输入长度超过max_length的数据，并重新采样。
Megatron-SWIFT：
a. 支持使LoRA训练，现支持CPT/SFT/DPO，显著加速MoE训练速度。
- 文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html#lora
- 训练脚本：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. 支持loss scale，方便Agent训练，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. 默认megatron-core版本升级至0.13。
d. 支持bshd格式，方便自定义attention_mask。
e. 日志优化：新增GPU占用、剩余训练时间等信息打印，并输出logging.jsonl存储训练日志。
f. 模型加载与转换速度优化，并增加模型加载进度条。
训练：
a. 支持Flash-Attention-3（含Megatron-SWIFT），训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. 新增--new_speical_tokens参数，方便新增特殊tokens。训练脚本参考: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. 新增--cached_dataset参数，支持CPT/SFT的离线tokenize。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. 序列Packing模块重构。加速Packing速度，并对多模态packing的磁盘存储问题优化。
e. 支持Qwen2.5-VL混合模态数据（即单条数据中含多种模态） + deepspeed训练。
f. 多模态模型训练支持 loss_scale。
g. rope_scaling 支持传入字典，此外支持设置 max_model_len 对 rope_scaling 的 factor 自动调整。
h. 支持DeepSpeed-AutoTP（该技术不支持LoRA）。
i. 多模态Packing兼容 transformers>=4.53；序列并行兼容 transformers>=4.52。
j. resume_only_model默认将进行数据跳过，并使用ignore_data_skip参数进行控制。
k. MoE模型训练支持 router_aux_loss_coef 参数。
l. template新增max_length裁剪保护机制，不对图像/视频等tokens进行裁剪。
m. tuner_backend unsloth 支持moe模型、device_map和DDP。
n. embedding训练支持liger_kernel。
RLHF：
a. 支持MPO训练，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. 多模态DPO支持了拒绝图片输入，在数据集中加入rejected_images列。
推理部署：
a. 支持embedding系列模型的推理部署，包括pt/vllm/sglang的infer_backend。部署脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b. InferEngine支持return_details参数，以输出prompt_token_ids和token_ids。
c. vLLM推理引擎兼容更多多模态模型：ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4。
d. vLLM参数重构，参数名前加入vllm_前缀。GRPO模块复用vLLM参数。
导出：
a. QLoRA支持Merge-LoRA，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. 支持MoE/多模态模型的FP8/BNB量化，脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize

新模型

纯文本模型：
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, Qwen/Qwen3-4B-[Instruct/Thinking]-2507系列（含Megatron-SWIFT），训练脚本参考：#5033
b. openai-mirror/gpt-oss-20b系列，最佳实践参考：#5277
c. ZhipuAI/GLM-4.5系列（含Megatron-SWIFT），训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct系列，最佳实践参考：#5236
e. mistralai/Devstral-Small-2505
多模态模型：
a. OpenBMB/MiniCPM-V-4，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh

English Version

New Features

GRPO
a. Added support for the GSPO algorithm. Use --importance_sampling_level sequence during GRPO training. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/GSPO.html
b. GRPO “server mode” now supports multi-node rollout; pass in multiple vllm_server_host/port. Example script: https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/multi_node/server_multi_node.sh
c. GRPO rollout is now GYM-compatible (thanks to contributor Mouse). Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/DeveloperGuide/gym_env.html
d. Added entropy_mask for filtering low-entropy tokens during loss computation, and the logger now tracks entropy dynamics. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/entropy_mask.html
e. Added support for the multi-round DeepEyes algorithm. Docs: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/deepeyes.html
f. GRPO supports --truncation_strategy delete: remove samples whose input length exceeds max_length and resample.
Megatron-SWIFT
a. Added LoRA training (CPT/SFT/DPO) to significantly accelerate MoE training.
- Docs: https://swift.readthedocs.io/en/latest/Instruction/Megatron-SWIFT-Training.html#lora-training
- Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/lora
b. Added loss-scaling to simplify Agent training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/loss_scale.sh
c. Default megatron-core upgraded to 0.13.
d. Added bshd tensor format to facilitate custom attention_mask.
e. Logging improvements: prints GPU memory, estimated remaining time, and writes logging.jsonl.
f. Faster model loading & conversion plus a progress bar.
Training
a. Added Flash-Attention-3 support (including Megatron-SWIFT). Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/flash_attention_3
b. New --new_special_tokens flag for adding special tokens. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/new_special_tokens
c. New --cached_dataset flag for offline tokenization in CPT/SFT. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/cached_dataset
d. Re-implemented the sequence-packing module for faster packing and better multimodal disk I/O.
e. Qwen2.5-VL hybrid-modal data (multiple modalities in a single sample) + DeepSpeed training supported.
f. Multimodal training now supports loss-scaling.
g. rope_scaling now accepts a dict; max_model_len can auto-adjust the scaling factor.
h. Added DeepSpeed-AutoTP (not compatible with LoRA).
i. Multimodal packing is compatible with transformers ≥ 4.53; sequence parallelism with transformers ≥ 4.52.
j. With resume_only_model, data skipping is enabled by default; control via ignore_data_skip.
k. MoE training supports router_aux_loss_coef.
l. Template files get a max_length clipping safeguard (no clipping of image/video tokens).
m. tuner_backend unsloth now supports MoE models, device_map, and DDP.
n. Embedding training supports liger_kernel.
RLHF
a. Added MPO training. Script: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/mpo.sh
b. Multimodal DPO can now reject image inputs by adding a rejected_images column.
Inference & Deployment
a. Added deployment for embedding models across pt/vllm/sglang back-ends. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/deploy/embedding
b. InferEngine supports return_details to output prompt_token_ids and token_ids.
c. vLLM back-end now supports more multimodal models: ovis2, glm4_1v, keye-vl, kimi-vl, glm4v, phi4-multimodal, llama4.
d. vLLM arguments refactored: all start with the vllm_ prefix. GRPO module reuses the same options.
Export
a. QLoRA now supports Merge-LoRA. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora
b. Added FP8 / BNB quantization for MoE and multimodal models. Scripts: https://github.com/modelscope/ms-swift/tree/main/examples/export/quantize

New Models

Text-only
a. Qwen/Qwen3-235B-A22B-[Instruct/Thinking]-2507, Qwen/Qwen3-Coder-480B-A35B-Instruct, and Qwen/Qwen3-4B-[Instruct/Thinking]-2507 (Megatron-SWIFT supported). Training script: #5033
b. openai-mirror/gpt-oss-20b family. Best-practice: #5277
c. ZhipuAI/GLM-4.5 family (Megatron-SWIFT supported). Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora/glm4_5_106b.sh
d. Hunyuan-7B-Instruct family. Best-practice: #5236
e. mistralai/Devstral-Small-2505
Multimodal
a. OpenBMB/MiniCPM-V-4. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/models/minicpmv/train.sh

What's Changed

[grpo] fix server arg check by @hjh0119 in #4865
[SP] clean up imports by @hjh0119 in #4878
fix loss_scale sp by @tastelikefeet in #4880
fix seq_cls generation_config by @Jintao-Huang in #4882
optimize imports by @tastelikefeet in #4883
[model] fix qwen eos_token by @Jintao-Huang in #4888
Fix: Correct training hang for Keye-VL on DeepSpeed with mixed data by @0russwest0 in #4889
[megatron] support LoRA & support loss_scale by @Jintao-Huang in #4812
update framework.txt by @Jintao-Huang in #4896
[megatron] fix pp mla by @Jintao-Huang in https://gi...

Contributors

mungg, firefighter-eric, and 14 other contributors

Assets 2

02 Aug 06:35

Jintao-Huang

v3.6.4

b5b61b6

Patch release v3.6.4

Full Changelog: v3.6.3...v3.6.4

Assets 2

29 Jul 06:24

Jintao-Huang

v3.6.3

340bf41

Patch release v3.6.3

Full Changelog: v3.6.2...v3.6.3

Assets 2

18 Jul 08:18

Jintao-Huang

v3.6.2

2fed8f7

Patch release v3.6.2

Full Changelog: v3.6.1...v3.6.2

Assets 2

11 Jul 02:14

Jintao-Huang

v3.6.1

8e77ce9

Patch release v3.6.1

Full Changelog: v3.6.0...v3.6.1

Assets 2

08 Jul 03:35

Jintao-Huang

v3.6.0

39bfc8a

v3.6.0

中文版

新特性

Megatron-SWIFT：
a. 支持更多的 MoE 模型结构，包括：DeepseekV3ForCausalLM、Dots1ForCausalLM 和 Ernie4_5_MoeForCausalLM。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. 支持更多的 Dense 模型结构，包括：MiMoForCausalLM、InternLM3ForCausalLM 和 Ernie4_5_ForCausalLM。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. 支持 DPO 训练。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. 支持 FP8 训练。
e. 支持更多 rope scaling 类型，包括：default、linear、yarn、dynamic、longrope、llama3 等。
f. --test_convert_precision参数优化，方便测试 mcore 与 huggingface 模型权重转换精度。
GRPO：
a. GRPO 多轮训练重构，支持使用 AsyncEngine 加速多轮推理，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. offload_model 参数额外对参考模型进行卸载。
c. 优化 sleep_level 和 offload_model 参数下的显存管理。
d. reward_funcs 增加了 trainer_state 入参，方便获取当前训练步数和总步数。
训练：
a. 支持 reranker 训练，训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO 纯文本大模型训练支持 ring-attention 切分序列长度，降低显存占用。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. channel loss 在CPT/SFT训练时，兼容 padding_free 与 packing。感谢招商银行技术团队的贡献。
d. remove_unused_columns 参数优化。设置为 False，则将额外数据集传递至 Trainer 内，方便自定义损失函数。
e. split_dataset_ratio参数默认值从0.01修改为0，默认不再进行验证集切分，需要手动设置--split_dataset_ratio或者--val_dataset。
f. 多模态模型 packing/padding_free 损失对齐问题修复。详见此PR：#4838
g. swanlab 支持训练完成后的飞书通知回调。
RLHF：
a. 纯文本/多模态模型支持 GKD 训练，部分场景下支持 padding_free 和 packing，训练脚本如下：
i. 大模型：https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. 多模态大模型：https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. reward model 训练支持 margin 参数支持，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm
全链路：
a. 支持使用 SGLang 推理引擎对 ms-swift 推理/部署/评测/ui模块进行加速，设置--infer_backend sglang即可。推理脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. 支持 FP8 量化，量化脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh
Web-UI：
a. 支持 SFT/RLHF/GRPO 在不同 Tab 页面训练，支持保存训练命令行。
b. Web-UI 界面支持数据采样。

新模型

多模态模型：
a. ZhipuAI/GLM-4.1V-9B-Thinking系列
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it系列
纯文本模型：
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT系列
b. rednote-hilab/dots.llm1.inst系列
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k系列（推理）
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ

English Version

New Features

Megatron-SWIFT:
a. Support for more MoE model architectures, including: DeepseekV3ForCausalLM, Dots1ForCausalLM, and Ernie4_5_MoeForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. Support for more Dense model architectures, including: MiMoForCausalLM, InternLM3ForCausalLM, and Ernie4_5_ForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. DPO training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. FP8 training supported.
e. More rope scaling types supported, including: default, linear, yarn, dynamic, longrope, llama3, etc.
f. --test_convert_precision parameter optimized for easier testing of weight conversion precision between mcore and huggingface models.
GRPO:
a. GRPO multi-turn training refactored, supporting accelerated multi-turn inference with AsyncEngine. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. The offload_model parameter now also offloads the reference model.
c. Optimized GPU memory management under sleep_level and offload_model parameters.
d. Added trainer_state as an input parameter to reward_funcs, making it easier to obtain the current and total training steps.
Training:
a. Reranker training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO pure-text large model training supports ring-attention sequence length partitioning, reducing memory usage. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. Channel loss in CPT/SFT training is compatible with padding_free and packing. Thanks to the technical team at China Merchants Bank for their contribution.
d. Optimized remove_unused_columns parameter. When set to False, extra dataset columns are passed to the Trainer for custom loss functions.
e. The default value for split_dataset_ratio changed from 0.01 to 0, so the validation set is not split by default. You now need to manually set --split_dataset_ratio or --val_dataset.
f. Fixed loss alignment issue between packing/padding_free for multimodal models. For details, see this PR: #4838
g. Swanlab now supports Feishu (Lark Suite) notification callback after training is completed.
RLHF:
a. Pure-text and multimodal models support GKD training, with some scenarios supporting padding_free and packing. Training scripts:
i. Large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. Multimodal large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. Reward model training now supports the margin parameter. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm
Full Pipeline:
a. SGLang inference engine can be used to accelerate ms-swift inference/deployment/evaluation/ui modules, by setting --infer_backend sglang. Inference script reference: https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. FP8 quantization supported. Quantization script reference: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh
Web-UI:
a. Supports SFT/RLHF/GRPO training on different Tab pages, and saves training command lines.
b. Web-UI interface supports data sampling.

New Models

Multimodal Models:
a. ZhipuAI/GLM-4.1V-9B-Thinking series
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it series
Pure Text Models:
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT series
b. rednote-hilab/dots.llm1.inst series
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k series (inference)
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ

What's Changed

fix emb script and docs by @tastelikefeet in #4521
[grpo] update doc about move_model_batches by @hjh0119 in #4523
fix LoraModel by @Jintao-Huang in #4536
support cognitivecomputations/DeepSeek-R1-0528-AWQ by @Jintao-Huang in #4537
fix: handle INFONCE_HARD_NEGATIVES as integer if provided by @dlutwy in #4545
fix qwen3 embedding saving by @tastelikefeet in #4548
[megatron/dpo] fix megatron packing_cache & update DPOTrainer by @Jintao-Huang in #4556
[megatron] support DPO by @Jintao-Huang in #4193
support dots1 by @Jintao-Huang in #4560
[grpo] support offloading reference model by @hjh0119 in #4554
[grpo] fix the pickle data collator by @hjh0119 in #4562
[dataset] fix toolbench (local) by @Jintao-Huang in #4563
[Bug]Fix ulysses train steps, embedding negative sample length by @tastelikefeet in #4565
fix args.json by @Jintao-Huang in #4566
[model] fix ovis gradient_checkpointing vit no_grad by @Jintao-Huang in #4571
[megatron] Fix megatron all_reduce warning by @Jintao-Huang in #4568
[grpo] remove data collator to top-level to avoid pickle error in spawn mode by @hjh0119 in #4582
[grpo] model weight synchronization before first turn rollout with async generation by @hjh0119 in #4584
[megatron] support more rope_scaling & support deepseek-r1-qwen3-8b/internlm3/mimo-7b by @Jintao-Huang in #4576
[grpo] restore num_generations check by @hjh0119 in #4590
fix gc_kwargs by @Jintao-Huang in #4591
Fix UI llm_train by @slin000111 in #4592
[mirror] update swift mirror by @Jintao-Huang in #4601
[megatron] compat megatron-core main branch by @Jintao-Huang in https://github.com/modelscope/ms-swift...

Contributors

sosofun, hrz394943230, and 11 other contributors

Assets 2

27 Jun 05:12

Jintao-Huang

v3.5.3

87dba55

Patch release v3.5.3

Full Changelog: v3.5.2...v3.5.3

Assets 2

20 Jun 14:49

Jintao-Huang

v3.5.2

49e0415

Patch release v3.5.2

Full Changelog: v3.5.1...v3.5.2

Assets 2

13 Jun 14:24

Jintao-Huang

v3.5.1

f38305b

Patch release v3.5.1

Full Changelog: v3.5.0...v3.5.1

Assets 2

08 Jun 16:51

Jintao-Huang

v3.5.0

cb64cb7

v3.5.0

中文版

新特性

GRPO：
a. 代码重构，使用参数vllm_mode指定。参数说明详见参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id1:~:text=vllm_mode%20server%20%E5%8F%82%E6%95%B0,colocate%20mode%20%E7%94%9F%E6%95%88%E3%80%82
b. GRPO长文本优化，支持ulysses序列并行，显著降低长文本训练显存占用，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. 新增sync_ref_model参数，支持训练中同步参考模型权重。
d. 支持 liger kernel loss，使用参数 use_liger_kernel，降低显存占用。
e. External mode 支持 move_model_batches，降低zero3同步权重时的显存峰值。
f. 集成 INTELLECT-2 的 Two-Sided Clipping 算法，使用参数 delta。
g. 支持奖励函数返回 None，适用于多任务训练，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id7
h. Internal mode 支持 vllm_server_base_url，传入外部 vLLM 服务器url。
i. 插件拓展：支持 QwenLong-L1 奖励模型插件。
j. 新增 steps_per_generation/generation_batch_size 参数，支持自定义采样批量大小。
k. Web-UI支持GRPO训练。
l. 以下参数将在 v3.6 移除：tensor_parallel_size / vllm_device / vllm_max_num_seqs / num_infer_workers。
训练：
a. CPT/SFT/DPO/GRPO 支持 padding free。通过将批次数据展平避免数据填充（padding），显著降低显存并加速训练。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. 多模态训练增强。支持使用 vit_lr 和 aligner_lr 参数独立控制 ViT 和 Aligner 模块的学习率。支持通过 vit_gradient_checkpointing 参数单独控制 vit 模块的 gradient checkpointing，性能基准测试参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT支持使用 channel loss 对不同 channel 数据集分别统计损失值。感谢招商银行技术团队的贡献。
d. CPT/SFT/DPO支持 use_logits_to_keep参数，降低显存占用，提升训练速度。
e. Qwen2.5-VL/Omni 支持传入图像目录进行视频训练。
推理部署：
a. swift infer批处理优化，新增 write_batch_size 参数，用于控制批处理推理结果写入result_path的间隔。
b. vllm 推理引擎默认使用 V1 engine，并支持TP和DP结合的推理模式，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT：
a. 非流式数据集支持通过 max_epochs 自动计算 train_iters。
b. 提供 extra_megatron_kwargs 参数，支持未写入ms-swift的megatron参数传入。

新模型

Qwen/Qwen3-Embedding-0.6B系列，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B系列，最佳实践参考https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530、XiaomiMiMo/MiMo-VL-7B-SFT系列
OpenBMB/MiniCPM4-0.5B系列

English Version

New Features

GRPO:
a. Code refactored, specified via the vllm_mode parameter. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#arguments-and-execution-script:~:text=vllm_mode%20server%20parameter,in%20colocate%20mode.
b. GRPO long-text optimization with Ulysses sequence parallelism, significantly reducing GPU memory usage during long-text training. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. Added sync_ref_model parameter to synchronize reference model weights during training.
d. Supports Liger Kernel Loss via use_liger_kernel parameter, reducing GPU memory consumption.
e. External mode supports move_model_batches to lower peak GPU memory during ZeRO-3 weight synchronization.
f. Integrated INTELLECT-2’s Two-Sided Clipping algorithm using the delta parameter.
g. Supports reward functions returning None, applicable for multi-task training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#multi-task-training
h. Internal mode supports vllm_server_base_url for passing external vLLM server URLs.
i. Plugin extension: Added QwenLong-L1 reward model plugin.
j. Added steps_per_generation and generation_batch_size parameters for customizing sampling batch size.
k. Web-UI supports GRPO training.
l. The following parameters will be deprecated in v3.6: tensor_parallel_size, vllm_device, vllm_max_num_seqs, num_infer_workers.
Training:
a. CPT/SFT/DPO/GRPO support padding-free training. By flattening batch data to avoid padding, GPU memory usage is reduced and training speed is improved. Script: https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. Multimodal training enhancements: Supports separate learning rates for ViT and Aligner modules via vit_lr and aligner_lr parameters. Added vit_gradient_checkpointing to independently control gradient checkpointing for ViT modules. Benchmark: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT support channel_loss to separately calculate loss for different channel datasets. Thanks to the contributions from the technical team at China Merchants Bank.
d. CPT/SFT/DPO support use_logits_to_keep to reduce GPU memory usage and accelerate training.
e. Qwen2.5-VL/Omni support video training by passing image directories.
Inference & Deployment:
a. Optimized swift infer batching with new write_batch_size parameter to control inference result write intervals to result_path.
b. vLLM inference engine now defaults to V1 engine and supports hybrid Tensor Parallelism (TP) and Data Parallelism (DP). Script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT:
a. Non-streaming datasets automatically calculate train_iters via max_epochs.
b. Added extra_megatron_kwargs to pass unlisted Megatron parameters into ms-swift.

New Models

Qwen/Qwen3-Embedding-0.6B series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B series. Best practices: https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530 & XiaomiMiMo/MiMo-VL-7B-SFT series
OpenBMB/MiniCPM4-0.5B series

What's Changed

[grpo] code refactor by @hjh0119 in #4097
support yarn by @tastelikefeet in #4197
fix ppo init model by @hjh0119 in #4199
fix ppo reward model by @hjh0119 in #4200
[doc] remove vllm version warning in grpo by @hjh0119 in #4204
[grpo] fix colocate + tp by @hjh0119 in #4209
Refactor packing by @Jintao-Huang in #4207
[grpo] set system in inputs by @hjh0119 in #4214
fix mm packing by @Jintao-Huang in #4217
fix packing multi_node by @Jintao-Huang in #4222
fix get reward model by @hjh0119 in #4225
fix val_dataset_shuffle by @Jintao-Huang in #4226
fix task type judgement in rlhf by @hjh0119 in #4228
fix eval extral args by @Yunnglin in #4227
fix loss_scale by @Jintao-Huang in #4229
update docs by @Jintao-Huang in #4235
[rlhf] prepare_model for ref_model & reduce peak memory in dpo by @hjh0119 in #4232
fix qwen2_5_vl VIDEO_TOTAL_PIXELS by @Jintao-Huang in #4236
Support super long length sft by @tastelikefeet in #4237
compat transformers 4.52 by @Jintao-Huang in #4238
update liger_kernel docs by @Jintao-Huang in #4241
[grpo] support synchronizing ref model by @hjh0119 in #4242
optimize packing io by @Jintao-Huang in #4244
fix register_post_encode_hook by @Jintao-Huang in #4247
compat megatron-core 0.11 by @Jintao-Huang in #4250
fix qwen2_5_omni by @Jintao-Huang in #4253
fix readme by @Jintao-Huang in #4256
[grpo] set v1 engine as default in external rollout by @hjh0119 in #4258
fix ddp_timeout by @Jintao-Huang in #4259
Add tqdm by @Jintao-Huang in #4260
Fix is_master by @Jintao-Huang in #4262
fix ppo zero3 by @Jintao-Huang in #4263
test link valid by @Jintao-Huang in #4265
update docs & fix quant by @Jintao-Huang in #4268
[grpo] fix external mode&multi turn by @hjh0119 in #4255
fix ulysses eval by @tastelikefeet in #4271
support IndexedDataset shard by @Jintao-Huang in #4269
Support vit_lr aligner_lr by @Jintao-Huang in #4273
support padding_free CPT/SFT by @Jintao-Huang in #4274
[grpo] fix num of reward_model > 1 by @hjh0119 in #4287
fix n > 1 with vLLM V1 Engine by @hjh0119 in #4295
update load_args by @Jintao-Huang in #4296
update swift image by @Jintao-Huang in #4309
Fix ulysses pending by @tastelikefeet in https://github...

Contributors

liuyanyi, wizyoung, and 7 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Releases: modelscope/ms-swift

v3.7.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Patch release v3.6.4

Uh oh!

Patch release v3.6.3

Uh oh!

Patch release v3.6.2

Uh oh!

Patch release v3.6.1

Uh oh!

v3.6.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

Patch release v3.5.3

Uh oh!

Patch release v3.5.2

Uh oh!

Patch release v3.5.1

Uh oh!

v3.5.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!