Skip to content

Commit 0e06cfd

Browse files
authored
support Qwen1.5-32b models (modelscope#655)
1 parent c8db740 commit 0e06cfd

File tree

6 files changed

+78
-1
lines changed

6 files changed

+78
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
3939
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
4040

4141
## 🎉 News
42+
- 🔥2024.04.03: Support **Qwen1.5-32B** series: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh) to start training!
4243
- 🔥2024.04.02: Support the fine-tuning and inference of Mengzi3-13B-Base model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh) to start training!
4344
- 🔥2024.04.01: Support **dbrx** series: dbrx-base and dbrx-instruct, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh) to start training!
4445
- 🔥2024.03.29: Support **Qwen1.5-MoE** series: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4040
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。
4141

4242
## 🎉 新闻
43+
- 🔥2024.04.03: 支持**Qwen1.5-32B**系列: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh)来开始训练!
4344
- 🔥2024.04.02: 支持Mengzi3-13B-Base模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh)来开始训练!
4445
- 🔥2024.04.01: 支持**dbrx**系列, dbrx-base和dbrx-instruct, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh)来开始训练!.
4546
- 🔥2024.03.29: 支持**Qwen1.5-MoE**系列: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.

docs/source/LLM/支持的模型和数据集.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,20 +35,23 @@
3535
|qwen1half-4b|[qwen/Qwen1.5-4B](https://modelscope.cn/models/qwen/Qwen1.5-4B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
3636
|qwen1half-7b|[qwen/Qwen1.5-7B](https://modelscope.cn/models/qwen/Qwen1.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
3737
|qwen1half-14b|[qwen/Qwen1.5-14B](https://modelscope.cn/models/qwen/Qwen1.5-14B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
38+
|qwen1half-32b|[qwen/Qwen1.5-32B](https://modelscope.cn/models/qwen/Qwen1.5-32B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
3839
|qwen1half-72b|[qwen/Qwen1.5-72B](https://modelscope.cn/models/qwen/Qwen1.5-72B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
3940
|qwen1half-moe-a2_7b|[qwen/Qwen1.5-MoE-A2.7B](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|
4041
|qwen1half-0_5b-chat|[qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4142
|qwen1half-1_8b-chat|[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4243
|qwen1half-4b-chat|[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4344
|qwen1half-7b-chat|[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4445
|qwen1half-14b-chat|[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
46+
|qwen1half-32b-chat|[qwen/Qwen1.5-32B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4547
|qwen1half-72b-chat|[qwen/Qwen1.5-72B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4648
|qwen1half-moe-a2_7b-chat|[qwen/Qwen1.5-MoE-A2.7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|transformers>=4.37|-|
4749
|qwen1half-0_5b-chat-int4|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
4850
|qwen1half-1_8b-chat-int4|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
4951
|qwen1half-4b-chat-int4|[qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
5052
|qwen1half-7b-chat-int4|[qwen/Qwen1.5-7B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
5153
|qwen1half-14b-chat-int4|[qwen/Qwen1.5-14B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
54+
|qwen1half-32b-chat-int4|[qwen/Qwen1.5-32B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
5255
|qwen1half-72b-chat-int4|[qwen/Qwen1.5-72B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|auto_gptq>=0.5, transformers>=4.37|-|
5356
|qwen1half-0_5b-chat-int8|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|
5457
|qwen1half-1_8b-chat-int8|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|
@@ -206,7 +209,7 @@
206209
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||-|
207210
|dbrx-instruct|[AI-ModelScope/dbrx-instruct](https://modelscope.cn/models/AI-ModelScope/dbrx-instruct/summary)|attn.Wqkv|dbrx|✔|✔|transformers>=4.36|-|
208211
|dbrx-base|[AI-ModelScope/dbrx-base](https://modelscope.cn/models/AI-ModelScope/dbrx-base/summary)|attn.Wqkv|dbrx|✔|✔|transformers>=4.36|-|
209-
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|✘|✘||-|
212+
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|✔|✔||-|
210213

211214

212215
## 数据集
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Experimental environment: V100, A10, 3090
2+
# 66GB GPU memory
3+
CUDA_VISIBLE_DEVICES=0 \
4+
swift infer \
5+
--ckpt_dir "output/qwen1half-32b-chat/vx-xxx/checkpoint-xxx" \
6+
--load_dataset_config true \
7+
--use_flash_attn true \
8+
--max_new_tokens 2048 \
9+
--temperature 0.1 \
10+
--top_p 0.7 \
11+
--repetition_penalty 1. \
12+
--do_sample true \
13+
--merge_lora false \
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Experimental environment: A100
2+
# 2*40GB GPU memory
3+
CUDA_VISIBLE_DEVICES=0,1 \
4+
swift sft \
5+
--model_type qwen1half-32b-chat \
6+
--sft_type lora \
7+
--tuner_backend swift \
8+
--dtype AUTO \
9+
--output_dir output \
10+
--dataset ms-bench-mini \
11+
--train_dataset_sample 5000 \
12+
--num_train_epochs 2 \
13+
--max_length 2048 \
14+
--check_dataset_strategy warning \
15+
--lora_rank 8 \
16+
--lora_alpha 32 \
17+
--lora_dropout_p 0.05 \
18+
--lora_target_modules DEFAULT \
19+
--gradient_checkpointing true \
20+
--batch_size 1 \
21+
--weight_decay 0.1 \
22+
--learning_rate 1e-4 \
23+
--gradient_accumulation_steps 16 \
24+
--max_grad_norm 0.5 \
25+
--warmup_ratio 0.03 \
26+
--eval_steps 100 \
27+
--save_steps 100 \
28+
--save_total_limit 2 \
29+
--logging_steps 10 \
30+
--use_flash_attn true \

swift/llm/utils/model.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,15 @@ class ModelType:
6060
qwen1half_4b = 'qwen1half-4b'
6161
qwen1half_7b = 'qwen1half-7b'
6262
qwen1half_14b = 'qwen1half-14b'
63+
qwen1half_32b = 'qwen1half-32b'
6364
qwen1half_72b = 'qwen1half-72b'
6465
qwen1half_moe_a2_7b = 'qwen1half-moe-a2_7b'
6566
qwen1half_0_5b_chat = 'qwen1half-0_5b-chat'
6667
qwen1half_1_8b_chat = 'qwen1half-1_8b-chat'
6768
qwen1half_4b_chat = 'qwen1half-4b-chat'
6869
qwen1half_7b_chat = 'qwen1half-7b-chat'
6970
qwen1half_14b_chat = 'qwen1half-14b-chat'
71+
qwen1half_32b_chat = 'qwen1half-32b-chat'
7072
qwen1half_72b_chat = 'qwen1half-72b-chat'
7173
qwen1half_moe_a2_7b_chat = 'qwen1half-moe-a2_7b-chat'
7274

@@ -76,6 +78,7 @@ class ModelType:
7678
qwen1half_4b_chat_int4 = 'qwen1half-4b-chat-int4'
7779
qwen1half_7b_chat_int4 = 'qwen1half-7b-chat-int4'
7880
qwen1half_14b_chat_int4 = 'qwen1half-14b-chat-int4'
81+
qwen1half_32b_chat_int4 = 'qwen1half-32b-chat-int4'
7982
qwen1half_72b_chat_int4 = 'qwen1half-72b-chat-int4'
8083
qwen1half_0_5b_chat_int8 = 'qwen1half-0_5b-chat-int8'
8184
qwen1half_1_8b_chat_int8 = 'qwen1half-1_8b-chat-int8'
@@ -991,6 +994,14 @@ def cross_entropy_forward(self, inputs: Tensor,
991994
support_flash_attn=True,
992995
support_vllm=True,
993996
requires=['transformers>=4.37'])
997+
@register_model(
998+
ModelType.qwen1half_32b,
999+
'qwen/Qwen1.5-32B',
1000+
LoRATM.qwen1half,
1001+
TemplateType.default_generation,
1002+
support_flash_attn=True,
1003+
support_vllm=True,
1004+
requires=['transformers>=4.37'])
9941005
@register_model(
9951006
ModelType.qwen1half_72b,
9961007
'qwen/Qwen1.5-72B',
@@ -1439,6 +1450,14 @@ def get_model_tokenizer_aqlm(model_dir: str,
14391450
support_flash_attn=True,
14401451
support_vllm=True,
14411452
requires=['transformers>=4.37'])
1453+
@register_model(
1454+
ModelType.qwen1half_32b_chat,
1455+
'qwen/Qwen1.5-32B-Chat',
1456+
LoRATM.qwen1half,
1457+
TemplateType.qwen,
1458+
support_flash_attn=True,
1459+
support_vllm=True,
1460+
requires=['transformers>=4.37'])
14421461
@register_model(
14431462
ModelType.qwen1half_72b_chat,
14441463
'qwen/Qwen1.5-72B-Chat',
@@ -1572,6 +1591,16 @@ def get_model_tokenizer_qwen1half(model_dir: str,
15721591
torch_dtype=torch.float16,
15731592
function_kwargs={'bits': 8},
15741593
support_flash_attn=True)
1594+
@register_model(
1595+
ModelType.qwen1half_32b_chat_int4,
1596+
'qwen/Qwen1.5-32B-Chat-GPTQ-Int4',
1597+
LoRATM.qwen1half,
1598+
TemplateType.qwen,
1599+
requires=['auto_gptq>=0.5', 'transformers>=4.37'],
1600+
torch_dtype=torch.float16,
1601+
function_kwargs={'bits': 4},
1602+
support_flash_attn=True,
1603+
support_vllm=True)
15751604
@register_model(
15761605
ModelType.qwen1half_72b_chat_int4,
15771606
'qwen/Qwen1.5-72B-Chat-GPTQ-Int4',

0 commit comments

Comments
 (0)