support Telechat-7b model (#630)

hjh0119 · web-flow · commit 033809f92f7b · 2024-03-31T20:33:37.000+08:00
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
 ## 🎉 News
 - 🔥2024.03.29: Support **Qwen1.5-MoE** series: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
 - 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
-- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
+- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
 - 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
 - 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.
@@ -395,7 +395,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [Ant CodeFuse series models](https://github.com/codefuse-ai)        | Chinese<br>English    | 6B-34B                                 | chat model<br>code model                      |
 | phi2                                           | Microsoft's PHI2 model                                                 | English            | 3B                                     | base model<br>code model                          |
 | Grok | [X-ai](https://github.com/xai-org/grok-1) | English | 300B | base model |
-| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 12B | chat model |
+| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 7B-12B | chat model |
 
 
 #### MLLMs
diff --git a/README_CN.md b/README_CN.md
@@ -42,7 +42,7 @@ SWIFT支持近**200种LLM和MLLM**（多模态大模型）的训练、推理、
 ## 🎉 新闻
 - 🔥2024.03.29: 支持**Qwen1.5-MoE**系列: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
 - 🔥2024.03.29: 支持**Grok-1**300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
-- 🔥2024.03.25: 支持TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练！.
+- 🔥2024.03.25: 支持TeleChat-7b和TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练！.
 - 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
 - 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.
@@ -394,7 +394,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [蚂蚁CodeFuse系列模型](https://github.com/codefuse-ai)       | 中文<br>英文 | 6B-34B                    | chat模型<br>代码模型                            |
 | phi2                           | 微软PHI2模型                                                 | 英文       | 3B                        | base模型<br>代码模型                            |
 | Grok | [X-ai](https://github.com/xai-org/grok-1) | 英文       | 300B | base模型                                    |
-| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | 中文<br>英文 | 12B | chat模型                                    |
+| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | 中文<br>英文 | 7B-12B | chat模型                                    |
 
 #### 多模态大模型
 
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -201,10 +201,10 @@
 |mamba-790m|[AI-ModelScope/mamba-790m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-790m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
 |mamba-1.4b|[AI-ModelScope/mamba-1.4b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-1.4b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
 |mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
+|telechat-7b|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B/summary)|self_attention.key_value, self_attention.query|telechat|&#x2714;|&#x2718;||-|
 |telechat-12b|[TeleAI/TeleChat-12B](https://modelscope.cn/models/TeleAI/TeleChat-12B/summary)|self_attention.key_value, self_attention.query|telechat|&#x2714;|&#x2718;||-|
 |grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|
 
-
 ## 数据集
 下表介绍了swift接入的数据集的相关信息:
 - Dataset Name: 数据集在swift中注册的dataset\_name.
diff --git a/examples/pytorch/llm/scripts/llama2_7b_aqlm_2bit_1x16/lora/sft.sh b/examples/pytorch/llm/scripts/llama2_7b_aqlm_2bit_1x16/lora/sft.sh
@@ -12,8 +12,7 @@ python llm_sft.py \
   --use_flash_attn true \
   --eval_steps 1000 \
   --save_steps 1000 \
-  --train_dataset_sample 100000 \
-  --val_dataset_sample 3000 \
+  --train_dataset_sample -1 \
   --num_train_epochs 2 \
   --check_dataset_strategy none \
   --gradient_checkpointing true \
diff --git a/examples/pytorch/llm/scripts/mamba-1.4b/lora/sft.sh b/examples/pytorch/llm/scripts/mamba-1.4b/lora/sft.sh
@@ -12,8 +12,7 @@ python llm_sft.py \
   --use_flash_attn true \
   --eval_steps 1000 \
   --save_steps 1000 \
-  --train_dataset_sample 100000 \
-  --val_dataset_sample 3000 \
+  --train_dataset_sample -1 \
   --num_train_epochs 2 \
   --check_dataset_strategy none \
   --gradient_checkpointing true \
diff --git a/examples/pytorch/llm/scripts/qwen1half_7b_chat_awq/lora/sft.sh b/examples/pytorch/llm/scripts/qwen1half_7b_chat_awq/lora/sft.sh
@@ -14,8 +14,7 @@ python llm_sft.py \
   --use_flash_attn true \
   --eval_steps 2000 \
   --save_steps 2000 \
-  --train_dataset_sample 100000 \
-  --val_dataset_sample 5000 \
+  --train_dataset_sample -1 \
   --num_train_epochs 1 \
   --check_dataset_strategy none \
   --gradient_checkpointing true \
diff --git a/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh b/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh
@@ -1,3 +1,5 @@
+# you will have to install transformers from source
+# pip install git+https://github.com/huggingface/transformers
 # Experimental environment: A100
 # 42GB GPU memory
 PYTHONPATH=../../.. \
diff --git a/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh
@@ -1,3 +1,5 @@
+# you will have to install transformers from source
+# pip install git+https://github.com/huggingface/transformers
 # Experimental environment: A100
 # 42GB GPU memory
 PYTHONPATH=../../.. \
diff --git a/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh b/examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh
@@ -1,3 +1,5 @@
+# you will have to install transformers from source
+# pip install git+https://github.com/huggingface/transformers
 # Experimental environment: A100
 # 17GB GPU memory
 
diff --git a/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh b/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh
@@ -12,8 +12,7 @@ python llm_sft.py \
   --use_flash_attn true \
   --eval_steps 1000 \
   --save_steps 1000 \
-  --train_dataset_sample 100000 \
-  --val_dataset_sample 3000 \
+  --train_dataset_sample -1 \
   --num_train_epochs 2 \
   --check_dataset_strategy none \
   --gradient_checkpointing true \
diff --git a/examples/pytorch/llm/scripts/telechat_7b/lora/infer.sh b/examples/pytorch/llm/scripts/telechat_7b/lora/infer.sh
@@ -0,0 +1,17 @@
+# Experiment env: A100
+# 1 * 16GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_infer.py \
+    --ckpt_dir "output/telechat-7b/vx-xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 2048 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.5 \
+    --top_p 0.7 \
+    --repetition_penalty 1. \
+    --do_sample true \
+    --merge_lora false \
+    --dtype fp16 \
+    --stream false
diff --git a/examples/pytorch/llm/scripts/telechat_7b/lora/sft.sh b/examples/pytorch/llm/scripts/telechat_7b/lora/sft.sh
@@ -0,0 +1,28 @@
+# Experiment env: A100
+# 1 * 18GB GPU memory
+PYTHONPATH=../../.. \
+CUDA_VISIBLE_DEVICES=0 \
+python llm_sft.py \
+  --model_type telechat-7b \
+  --dataset dureader-robust-zh \
+  --batch_size 1 \
+  --max_length 1024 \
+  --gradient_accumulation_steps 16 \
+  --learning_rate 5e-5 \
+  --use_flash_attn true \
+  --eval_steps 1000 \
+  --save_steps 1000 \
+  --train_dataset_sample -1 \
+  --num_train_epochs 2 \
+  --check_dataset_strategy none \
+  --gradient_checkpointing true \
+  --weight_decay 0.1 \
+  --max_grad_norm 1.0 \
+  --warmup_ratio 0.03 \
+  --save_total_limit 2 \
+  --logging_steps 10 \
+  --sft_type lora \
+  --lora_target_modules DEFAULT \
+  --lora_rank 8 \
+  --lora_alpha 32 \
+  --dtype fp16
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -268,6 +268,7 @@ class ModelType:
     mamba_1_4b = 'mamba-1.4b'
     mamba_2_8b = 'mamba-2.8b'
     # teleAI
+    telechat_7b = 'telechat-7b'
     telechat_12b = 'telechat-12b'
     # grok-1
     grok_1 = 'grok-1'
@@ -2440,6 +2441,35 @@ def get_model_tokenizer_phi(model_dir: str,
         **kwargs)
 
 
+@register_model(
+    ModelType.telechat_7b,
+    'TeleAI/TeleChat-7B',
+    LoRATM.telechat,
+    TemplateType.telechat,
+    support_flash_attn=True)
+def get_model_tokenizer_telechat(model_dir: str,
+                                 torch_dtype: Dtype,
+                                 model_kwargs: Dict[str, Any],
+                                 load_model: bool = True,
+                                 **kwargs):
+    if torch_dtype == torch.bfloat16:
+        logger.info(
+            'telechat-7b does not support the bfl16 dtype; the dtype is converted to fp16.'
+        )
+        torch_dtype = torch.float16
+    model_config = AutoConfig.from_pretrained(
+        model_dir, trust_remote_code=True)
+    use_flash_attn = kwargs.pop('use_flash_attn', False)
+    model_config.flash_attn = use_flash_attn
+    return get_model_tokenizer_from_repo(
+        model_dir,
+        torch_dtype,
+        model_kwargs,
+        load_model,
+        model_config=model_config,
+        **kwargs)
+
+
 @register_model(
     ModelType.deepseek_moe_16b_chat,
     'deepseek-ai/deepseek-moe-16b-chat',

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+# you will have to install transformers from source`
	`2`	`+# pip install git+https://github.com/huggingface/transformers`
`1`	`3`	`# Experimental environment: A100`
`2`	`4`	`# 42GB GPU memory`
`3`	`5`	`PYTHONPATH=../../.. \`