Skip to content

Commit 033809f

Browse files
authored
support Telechat-7b model (#630)
1 parent dd9410e commit 033809f

File tree

13 files changed

+90
-13
lines changed

13 files changed

+90
-13
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
4141
## 🎉 News
4242
- 🔥2024.03.29: Support **Qwen1.5-MoE** series: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
4343
- 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
44-
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
44+
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
4545
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4646
- 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
4747
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.
@@ -395,7 +395,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
395395
| CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [Ant CodeFuse series models](https://github.com/codefuse-ai) | Chinese<br>English | 6B-34B | chat model<br>code model |
396396
| phi2 | Microsoft's PHI2 model | English | 3B | base model<br>code model |
397397
| Grok | [X-ai](https://github.com/xai-org/grok-1) | English | 300B | base model |
398-
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 12B | chat model |
398+
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 7B-12B | chat model |
399399

400400

401401
#### MLLMs

README_CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4242
## 🎉 新闻
4343
- 🔥2024.03.29: 支持**Qwen1.5-MoE**系列: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
4444
- 🔥2024.03.29: 支持**Grok-1**300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
45-
- 🔥2024.03.25: 支持TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
45+
- 🔥2024.03.25: 支持TeleChat-7b和TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
4646
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4747
- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
4848
- 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.
@@ -394,7 +394,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
394394
| CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [蚂蚁CodeFuse系列模型](https://github.com/codefuse-ai) | 中文<br>英文 | 6B-34B | chat模型<br>代码模型 |
395395
| phi2 | 微软PHI2模型 | 英文 | 3B | base模型<br>代码模型 |
396396
| Grok | [X-ai](https://github.com/xai-org/grok-1) | 英文 | 300B | base模型 |
397-
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | 中文<br>英文 | 12B | chat模型 |
397+
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | 中文<br>英文 | 7B-12B | chat模型 |
398398

399399
#### 多模态大模型
400400

docs/source/LLM/支持的模型和数据集.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,10 +201,10 @@
201201
|mamba-790m|[AI-ModelScope/mamba-790m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-790m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
202202
|mamba-1.4b|[AI-ModelScope/mamba-1.4b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-1.4b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
203203
|mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|
204+
|telechat-7b|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B/summary)|self_attention.key_value, self_attention.query|telechat|&#x2714;|&#x2718;||-|
204205
|telechat-12b|[TeleAI/TeleChat-12B](https://modelscope.cn/models/TeleAI/TeleChat-12B/summary)|self_attention.key_value, self_attention.query|telechat|&#x2714;|&#x2718;||-|
205206
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|
206207

207-
208208
## 数据集
209209
下表介绍了swift接入的数据集的相关信息:
210210
- Dataset Name: 数据集在swift中注册的dataset\_name.

examples/pytorch/llm/scripts/llama2_7b_aqlm_2bit_1x16/lora/sft.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ python llm_sft.py \
1212
--use_flash_attn true \
1313
--eval_steps 1000 \
1414
--save_steps 1000 \
15-
--train_dataset_sample 100000 \
16-
--val_dataset_sample 3000 \
15+
--train_dataset_sample -1 \
1716
--num_train_epochs 2 \
1817
--check_dataset_strategy none \
1918
--gradient_checkpointing true \

examples/pytorch/llm/scripts/mamba-1.4b/lora/sft.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ python llm_sft.py \
1212
--use_flash_attn true \
1313
--eval_steps 1000 \
1414
--save_steps 1000 \
15-
--train_dataset_sample 100000 \
16-
--val_dataset_sample 3000 \
15+
--train_dataset_sample -1 \
1716
--num_train_epochs 2 \
1817
--check_dataset_strategy none \
1918
--gradient_checkpointing true \

examples/pytorch/llm/scripts/qwen1half_7b_chat_awq/lora/sft.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ python llm_sft.py \
1414
--use_flash_attn true \
1515
--eval_steps 2000 \
1616
--save_steps 2000 \
17-
--train_dataset_sample 100000 \
18-
--val_dataset_sample 5000 \
17+
--train_dataset_sample -1 \
1918
--num_train_epochs 1 \
2019
--check_dataset_strategy none \
2120
--gradient_checkpointing true \

examples/pytorch/llm/scripts/qwen1half_moe_a2_7b/lora/sft.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# you will have to install transformers from source
2+
# pip install git+https://github.com/huggingface/transformers
13
# Experimental environment: A100
24
# 42GB GPU memory
35
PYTHONPATH=../../.. \

examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat/lora/sft.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# you will have to install transformers from source
2+
# pip install git+https://github.com/huggingface/transformers
13
# Experimental environment: A100
24
# 42GB GPU memory
35
PYTHONPATH=../../.. \

examples/pytorch/llm/scripts/qwen1half_moe_a2_7b_chat_int4/qlora/sft.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# you will have to install transformers from source
2+
# pip install git+https://github.com/huggingface/transformers
13
# Experimental environment: A100
24
# 17GB GPU memory
35

examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,7 @@ python llm_sft.py \
1212
--use_flash_attn true \
1313
--eval_steps 1000 \
1414
--save_steps 1000 \
15-
--train_dataset_sample 100000 \
16-
--val_dataset_sample 3000 \
15+
--train_dataset_sample -1 \
1716
--num_train_epochs 2 \
1817
--check_dataset_strategy none \
1918
--gradient_checkpointing true \

0 commit comments

Comments
 (0)