Skip to content

Commit e30783a

Browse files
committed
Merge commit 'f203a5c3cda7ee30b191e4df0619c19a5bc13c03' into feat/grok-1
* commit 'f203a5c3cda7ee30b191e4df0619c19a5bc13c03': fix Telechat model (modelscope#623) fix save dir (modelscope#622) support TeleChat-12b (modelscope#607) update ui (modelscope#621) fix adalora and device_map (modelscope#619) fix deploy safe_response (modelscope#614) support Mistral-7b-v0.2 (modelscope#605) # Conflicts: # README.md # README_CN.md # docs/source/LLM/支持的模型和数据集.md # swift/llm/utils/model.py
2 parents 08016e7 + f203a5c commit e30783a

File tree

19 files changed

+346
-28
lines changed

19 files changed

+346
-28
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
4040

4141
## 🎉 News
4242
- 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
43+
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
4344
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4445
- 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
4546
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4141

4242
## 🎉 新闻
4343
- 🔥2024.03.29: 支持**Grok-1**300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
44+
- 🔥2024.03.25: 支持TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
4445
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4546
- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
4647
- 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.

docs/source/LLM/支持的模型和数据集.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@
141141
|openbuddy-deepseek-67b-chat|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://modelscope.cn/models/OpenBuddy/openbuddy-deepseek-67b-v15.2/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔||-|
142142
|openbuddy-mixtral-moe-7b-chat|[OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k](https://modelscope.cn/models/OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|transformers>=4.36|-|
143143
|mistral-7b|[AI-ModelScope/Mistral-7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✔|transformers>=4.34|-|
144+
|mistral-7b-v2|[AI-ModelScope/Mistral-7B-v0.2-hf](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.2-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✔|transformers>=4.34|-|
144145
|mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|
145146
|mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|
146147
|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✔|transformers>=4.36|-|
@@ -197,6 +198,7 @@
197198
|mamba-790m|[AI-ModelScope/mamba-790m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-790m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|transformers>=4.39.0|-|
198199
|mamba-1.4b|[AI-ModelScope/mamba-1.4b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-1.4b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|transformers>=4.39.0|-|
199200
|mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|transformers>=4.39.0|-|
201+
|telechat-12b|[TeleAI/telechat-12B](https://modelscope.cn/models/TeleAI/telechat-12B/summary)|self_attention.key_value, self_attention.query|telechat|✔|✘||-|
200202
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘||-|
201203

202204

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Experimental environment: A100
2+
# 16GB GPU memory
3+
PYTHONPATH=../../.. \
4+
CUDA_VISIBLE_DEVICES=0 \
5+
python llm_infer.py \
6+
--ckpt_dir "output/mistral-7b-v2/vx-xxx/checkpoint-xxx" \
7+
--load_dataset_config true \
8+
--use_flash_attn true \
9+
--max_new_tokens 2048 \
10+
--temperature 0.5 \
11+
--top_p 0.7 \
12+
--repetition_penalty 1. \
13+
--do_sample true \
14+
--merge_lora false \
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Experimental environment: A100
2+
# 19GB GPU memory
3+
PYTHONPATH=../../.. \
4+
CUDA_VISIBLE_DEVICES=0 \
5+
python llm_sft.py \
6+
--model_id_or_path AI-ModelScope/Mistral-7B-v0.2-hf \
7+
--model_revision master \
8+
--sft_type lora \
9+
--tuner_backend swift \
10+
--template_type AUTO \
11+
--dtype AUTO \
12+
--output_dir output \
13+
--dataset dureader-robust-zh \
14+
--train_dataset_sample -1 \
15+
--num_train_epochs 1 \
16+
--max_length 2048 \
17+
--check_dataset_strategy warning \
18+
--lora_rank 8 \
19+
--lora_alpha 32 \
20+
--lora_dropout_p 0.05 \
21+
--lora_target_modules DEFAULT \
22+
--gradient_checkpointing true \
23+
--batch_size 1 \
24+
--weight_decay 0.1 \
25+
--learning_rate 1e-4 \
26+
--gradient_accumulation_steps 16 \
27+
--max_grad_norm 0.5 \
28+
--warmup_ratio 0.03 \
29+
--eval_steps 100 \
30+
--save_steps 100 \
31+
--save_total_limit 2 \
32+
--logging_steps 10 \
33+
--use_flash_attn true \
34+
--save_only_model true \
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Experiment env: A100
2+
# 1 * 26GB GPU memory
3+
PYTHONPATH=../../.. \
4+
CUDA_VISIBLE_DEVICES=0 \
5+
python llm_infer.py \
6+
--ckpt_dir "output/telechat-12b/vx-xxx/checkpoint-xxx" \
7+
--load_dataset_config true \
8+
--max_length 2048 \
9+
--use_flash_attn true \
10+
--max_new_tokens 2048 \
11+
--temperature 0.5 \
12+
--top_p 0.7 \
13+
--repetition_penalty 1. \
14+
--do_sample true \
15+
--merge_lora false \
16+
--dtype fp16 \
17+
--stream false
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Experiment env: A100
2+
# 1 * 30GB GPU memory
3+
PYTHONPATH=../../.. \
4+
CUDA_VISIBLE_DEVICES=0 \
5+
python llm_sft.py \
6+
--model_type telechat-12b \
7+
--dataset dureader-robust-zh \
8+
--batch_size 1 \
9+
--max_length 1024 \
10+
--gradient_accumulation_steps 16 \
11+
--learning_rate 5e-5 \
12+
--use_flash_attn true \
13+
--eval_steps 1000 \
14+
--save_steps 1000 \
15+
--train_dataset_sample 100000 \
16+
--val_dataset_sample 3000 \
17+
--num_train_epochs 2 \
18+
--check_dataset_strategy none \
19+
--gradient_checkpointing true \
20+
--weight_decay 0.1 \
21+
--max_grad_norm 1.0 \
22+
--warmup_ratio 0.03 \
23+
--save_total_limit 2 \
24+
--logging_steps 10 \
25+
--sft_type lora \
26+
--lora_target_modules DEFAULT \
27+
--lora_rank 8 \
28+
--lora_alpha 32 \
29+
--dtype fp16

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ ignore-words-list = patten,nd,ty,mot,hist,formating,winn,gool,datas,wan,confids
2222
max-line-length = 120
2323
select = B,C,E,F,P,T4,W,B9
2424
ignore = F401,F403,F405,F821,W503,E251,W504
25-
exclude = docs/src,*.pyi,.git
25+
exclude = docs/src,*.pyi,.git,peft.py
2626

2727
[darglint]
2828
ignore=DAR101

swift/llm/deploy.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
CompletionStreamResponse, DeltaMessage, DeployArguments,
2323
Model, ModelList, UsageInfo, inference, inference_stream,
2424
messages_to_history, random_uuid)
25+
from .utils.utils import _get_safe_print_idx
2526

2627
logger = get_logger()
2728

@@ -241,8 +242,11 @@ async def _generate_stream():
241242
choices = []
242243
for output in result.outputs:
243244
text = template.tokenizer.decode(output.token_ids, True)
244-
delta_text = text[print_idx_list[output.index]:]
245-
print_idx_list[output.index] += len(delta_text)
245+
new_print_idx = _get_safe_print_idx(
246+
text, print_idx_list[output.index], output.finished())
247+
delta_text = text[print_idx_list[output.
248+
index]:new_print_idx]
249+
print_idx_list[output.index] = new_print_idx
246250
choice = ChatCompletionResponseStreamChoice(
247251
index=output.index,
248252
delta=DeltaMessage(
@@ -259,8 +263,11 @@ async def _generate_stream():
259263
choices = []
260264
for output in result.outputs:
261265
text = template.tokenizer.decode(output.token_ids, True)
262-
delta_text = text[print_idx_list[output.index]:]
263-
print_idx_list[output.index] += len(delta_text)
266+
new_print_idx = _get_safe_print_idx(
267+
text, print_idx_list[output.index], output.finished())
268+
delta_text = text[print_idx_list[output.
269+
index]:new_print_idx]
270+
print_idx_list[output.index] = new_print_idx
264271
choice = CompletionResponseStreamChoice(
265272
index=output.index,
266273
text=delta_text,

swift/llm/utils/model.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ class ModelType:
191191
openbuddy_mixtral_moe_7b_chat = 'openbuddy-mixtral-moe-7b-chat'
192192
# mistral
193193
mistral_7b = 'mistral-7b'
194+
mistral_7b_v2 = 'mistral-7b-v2'
194195
mistral_7b_instruct = 'mistral-7b-instruct'
195196
mistral_7b_instruct_v2 = 'mistral-7b-instruct-v2'
196197
mixtral_moe_7b = 'mixtral-moe-7b'
@@ -263,7 +264,9 @@ class ModelType:
263264
mamba_790m = 'mamba-790m'
264265
mamba_1_4b = 'mamba-1.4b'
265266
mamba_2_8b = 'mamba-2.8b'
266-
# grok-1
267+
# teleAI
268+
telechat_12b = 'telechat-12b'
269+
# grok-1
267270
grok_1 = 'grok-1'
268271

269272
@classmethod
@@ -297,7 +300,8 @@ class LoRATM(NamedTuple):
297300
phi = ['Wqkv']
298301
internlm2 = ['wqkv']
299302
mamba = ['in_proj', 'x_proj', 'embeddings', 'out_proj']
300-
grok_1 = ['q_proj', 'k_proj', 'v_proj']
303+
telechat = ['self_attention.key_value', 'self_attention.query']
304+
grok_1 = ['q_proj', 'k_proj', 'v_proj']
301305

302306

303307
GetModelTokenizerFunction = Callable[..., Tuple[Optional[PreTrainedModel],
@@ -1218,6 +1222,14 @@ def cross_entropy_forward(self, inputs: Tensor,
12181222
requires=['transformers>=4.34'],
12191223
support_flash_attn=True,
12201224
support_vllm=True)
1225+
@register_model(
1226+
ModelType.mistral_7b_v2,
1227+
'AI-ModelScope/Mistral-7B-v0.2-hf',
1228+
LoRATM.llama2,
1229+
TemplateType.default_generation_bos,
1230+
requires=['transformers>=4.34'],
1231+
support_flash_attn=True,
1232+
support_vllm=True)
12211233
@register_model(
12221234
ModelType.mixtral_moe_7b,
12231235
'AI-ModelScope/Mixtral-8x7B-v0.1',
@@ -2380,6 +2392,12 @@ def get_model_tokenizer_codellama(model_dir: str,
23802392
support_vllm=True,
23812393
support_gradient_checkpointing=False,
23822394
tags=['coding'])
2395+
@register_model(
2396+
ModelType.telechat_12b,
2397+
'TeleAI/TeleChat-12B',
2398+
LoRATM.telechat,
2399+
TemplateType.telechat,
2400+
support_flash_attn=True)
23832401
def get_model_tokenizer_phi(model_dir: str,
23842402
torch_dtype: Dtype,
23852403
model_kwargs: Dict[str, Any],

0 commit comments

Comments
 (0)