Skip to content

Commit 5a53333

Browse files
authored
support Xverse-MoE model (modelscope#668)
1 parent d6a93a9 commit 5a53333

File tree

6 files changed

+53
-5
lines changed

6 files changed

+53
-5
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
3939
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
4040

4141
## 🎉 News
42+
- 2024.04.08: Support the fine-tuning and inference of XVERSE-MoE-A4.2B model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh) to start training!
4243
- 2024.04.04: Support **QLoRA+FSDP** to train a 70B model with two 24G memory GPUs, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh) to train.
4344
- 🔥2024.04.03: Support **Qwen1.5-32B** series: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh) to start training!
4445
- 🔥2024.04.02: Support the fine-tuning and inference of Mengzi3-13B-Base model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh) to start training!
@@ -373,11 +374,11 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
373374

374375
| Model Type | Model Introduction | Language | Model Size | Model Type |
375376
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
376-
| Qwen<br>Qwen1.5 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model |
377+
| Qwen<br>Qwen1.5 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model<br>MoE model |
377378
| ChatGLM2<br>ChatGLM3<br>Codegeex2 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B | base model<br>chat model<br>code model |
378379
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
379380
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
380-
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model |
381+
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
381382
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
382383
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B | base model<br>instruct model<br>MoE model |
383384
| YI | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B | base model<br>chat model<br>long text model |

README_CN.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4040
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。
4141

4242
## 🎉 新闻
43+
- 2024.04.08: 支持XVERSE-MoE-A4.2B模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh)来开始训练!
4344
- 2024.04.04: 支持使用**QLoRA+FSDP**来使用两张24G显卡训练70B模型, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh)开始训练.
4445
- 🔥2024.04.03: 支持**Qwen1.5-32B**系列: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh)来开始训练!
4546
- 🔥2024.04.02: 支持Mengzi3-13B-Base模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh)来开始训练!
@@ -372,11 +373,11 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
372373

373374
| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
374375
| --------------------------------------------------- | ------------------------------------------------------------ |----------| ------------------------- |-------------------------------------------|
375-
| Qwen<br>Qwen1.5 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-72B<br>包含量化版本 | base模型<br>chat模型 |
376+
| Qwen<br>Qwen1.5 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-72B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型 | |
376377
| ChatGLM2<br>ChatGLM3<br>Codegeex2 | [智谱ChatGLM系列模型](https://github.com/THUDM/) | 中文<br>英文 | 6B | base模型<br>chat模型<br>代码模型 |
377378
| Baichuan<br>Baichuan2 | [百川1和百川2](https://github.com/baichuan-inc) | 中文<br>英文 | 7B-13B<br>包含量化版本 | base模型<br>chat模型 |
378379
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |
379-
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型 |
380+
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型<br>MoE模型 | |
380381
| LLaMA2 | [LLaMA2系列模型](https://github.com/facebookresearch/llama) | 英文 | 7B-70B<br>包含量化版本 | base模型<br>chat模型 |
381382
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B | base模型<br>instruct模型<br>MoE模型 |
382383
| YI | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B | base模型<br>chat模型<br>长文本模型 |

docs/source/LLM/支持的模型和数据集.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@
174174
|xverse-65b-v2|[xverse/XVERSE-65B-2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|
175175
|xverse-65b-chat|[xverse/XVERSE-65B-Chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary)|q_proj, k_proj, v_proj|xverse|&#x2718;|&#x2718;||-|
176176
|xverse-13b-256k|[xverse/XVERSE-13B-256K](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|
177+
|xverse-moe-a4_2b|[xverse/XVERSE-MoE-A4.2B](https://modelscope.cn/models/xverse/XVERSE-MoE-A4.2B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2718;||-|
177178
|orion-14b|[OrionStarAI/Orion-14B-Base](https://modelscope.cn/models/OrionStarAI/Orion-14B-Base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2718;||-|
178179
|orion-14b-chat|[OrionStarAI/Orion-14B-Chat](https://modelscope.cn/models/OrionStarAI/Orion-14B-Chat/summary)|q_proj, k_proj, v_proj|orion|&#x2714;|&#x2718;||-|
179180
|bluelm-7b|[vivo-ai/BlueLM-7B-Base](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2718;|&#x2718;||-|
@@ -209,7 +210,7 @@
209210
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|
210211
|dbrx-instruct|[AI-ModelScope/dbrx-instruct](https://modelscope.cn/models/AI-ModelScope/dbrx-instruct/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|
211212
|dbrx-base|[AI-ModelScope/dbrx-base](https://modelscope.cn/models/AI-ModelScope/dbrx-base/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|
212-
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|&#x2714;|&#x2714;||-|
213+
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|&#x2718;|&#x2718;||-|
213214

214215

215216
## 数据集
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Experimental environment: A100
2+
# 60GB GPU memory
3+
4+
CUDA_VISIBLE_DEVICES=0 \
5+
swift infer \
6+
--ckpt_dir "output/xverse-moe-a4_2b/vx-xxx/checkpoint-xxx" \
7+
--load_dataset_config true \
8+
--max_new_tokens 2048 \
9+
--temperature 0.7 \
10+
--top_p 0.7 \
11+
--repetition_penalty 1. \
12+
--do_sample true \
13+
--merge_lora false \
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Experimental environment: A100
2+
# 66GB GPU memory
3+
CUDA_VISIBLE_DEVICES=0 \
4+
swift sft \
5+
--model_type xverse-moe-a4_2b \
6+
--sft_type lora \
7+
--tuner_backend swift \
8+
--dtype fp16 \
9+
--dataset dureader-robust-zh \
10+
--train_dataset_sample -1 \
11+
--num_train_epochs 1 \
12+
--max_length 1024 \
13+
--check_dataset_strategy warning \
14+
--lora_dtype fp16 \
15+
--lora_rank 8 \
16+
--lora_alpha 32 \
17+
--lora_dropout_p 0.05 \
18+
--lora_target_modules DEFAULT \
19+
--gradient_checkpointing true \
20+
--batch_size 1 \
21+
--weight_decay 0.1 \
22+
--learning_rate 1e-4 \
23+
--gradient_accumulation_steps 16 \
24+
--max_grad_norm 0.5 \
25+
--warmup_ratio 0.03 \
26+
--eval_steps 100 \
27+
--save_steps 100 \
28+
--save_total_limit 2 \
29+
--logging_steps 10 \

swift/llm/utils/model.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,7 @@ class ModelType:
228228
xverse_65b_v2 = 'xverse-65b-v2'
229229
xverse_65b_chat = 'xverse-65b-chat'
230230
xverse_13b_256k = 'xverse-13b-256k'
231+
xverse_moe_a4_2b = 'xverse-moe-a4_2b'
231232
# orion
232233
orion_14b = 'orion-14b'
233234
orion_14b_chat = 'orion-14b-chat'
@@ -428,6 +429,8 @@ def _register_model(
428429
LoRATM.llama2, TemplateType.xverse)
429430
@register_model(ModelType.xverse_7b, 'xverse/XVERSE-7B', LoRATM.llama2,
430431
TemplateType.default_generation)
432+
@register_model(ModelType.xverse_moe_a4_2b, 'xverse/XVERSE-MoE-A4.2B',
433+
LoRATM.llama2, TemplateType.default_generation)
431434
@register_model(
432435
ModelType.baichuan_13b_chat,
433436
'baichuan-inc/Baichuan-13B-Chat',

0 commit comments

Comments
 (0)