Skip to content

Commit 83b716a

Browse files
authored
Support glm4 (modelscope#1069)
1 parent eb44511 commit 83b716a

File tree

12 files changed

+294
-13
lines changed

12 files changed

+294
-13
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
4747
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
4848

4949
## 🎉 News
50+
- 🔥2024.06.05: Support for **glm4** series LLM and glm4v-9b-chat MLLM. You can refer to [glm4v best practice](docs/source/Multi-Modal/glm4v最佳实践.md).
5051
- 🔥2024.06.01: Supoprts **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
5152
- 🔥2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
5253
- 2024.05.31: Supports Mini-Internvl model, Use model_type `mini-internvl-chat-2b-v1_5` and `mini-internvl-chat-4b-v1_5`to train.

README_CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,8 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
4848
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
4949

5050
## 🎉 新闻
51-
- 🔥2024.06.01: 支持**SimPO**训练,使用`swift simpo`来开始训练, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
51+
- 🔥2024.06.05: 支持glm4系列大模型和glm4v-9b-chat多模态大模型, 可以查看[glm4v最佳实践](docs/source/Multi-Modal/glm4v最佳实践.md).
52+
- 🔥2024.06.01: 支持**SimPO**训练,使用`swift simpo`来开始训练,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
5253
- 🔥2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
5354
- 2024.05.31: 支持Mini-Internvl多模态模型, 使用model_type `mini-internvl-chat-2b-v1_5``mini-internvl-chat-4b-v1_5`来训练.
5455
- 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.

docs/source/LLM/支持的模型和数据集.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,9 @@
8585
|chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|✘|✔||-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
8686
|chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|✘|✔||-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
8787
|codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
88+
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
89+
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
90+
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;||-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
8891
|llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
8992
|llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
9093
|llama2-13b|[modelscope/Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)|
@@ -282,6 +285,7 @@
282285
|qwen-vl-chat-int4|[qwen/Qwen-VL-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5|vision|[Qwen/Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)|
283286
|qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|c_attn|qwen-audio-generation|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio](https://huggingface.co/Qwen/Qwen-Audio)|
284287
|qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)|
288+
|glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|query_key_value|glm4v|&#x2718;|&#x2718;||vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
285289
|llava1_6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
286290
|llava1_6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
287291
|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
@@ -294,7 +298,7 @@
294298
|internvl-chat-v1_5|[AI-ModelScope/InternVL-Chat-V1-5](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)|
295299
|internvl-chat-v1_5-int8|[AI-ModelScope/InternVL-Chat-V1-5-int8](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-int8/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5-int8](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-int8)|
296300
|mini-internvl-chat-2b-v1_5|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)|
297-
|mini-internvl-chat-4b-v1_5|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/summary)|qkv_proj|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)|
301+
|mini-internvl-chat-4b-v1_5|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/summary)|qkv_proj|internvl-phi3|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)|
298302
|deepseek-vl-1_3b-chat|[deepseek-ai/deepseek-vl-1.3b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|vision|[deepseek-ai/deepseek-vl-1.3b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat)|
299303
|deepseek-vl-7b-chat|[deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|vision|[deepseek-ai/deepseek-vl-7b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)|
300304
|paligemma-3b-pt-224|[AI-ModelScope/paligemma-3b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-224/summary)|q_proj, k_proj, v_proj|paligemma|&#x2714;|&#x2718;|transformers>=4.41|vision|[google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224)|

docs/source/Multi-Modal/cogvlm2最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type cogvlm2-19b-chat
3232
输出: (支持传入本地路径或URL)
3333
```python
3434
"""
35-
<<< 描述这种图片
35+
<<< 描述这张图片
3636
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
3737
这是一张特写照片,展示了一只灰色和白色相间的猫。这只猫的眼睛是灰色的,鼻子是粉色的,嘴巴微微张开。它的毛发看起来柔软而蓬松,背景模糊,突出了猫的面部特征。
3838
--------------------------------------------------
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
2+
# GLM4V 最佳实践
3+
4+
## 目录
5+
- [环境准备](#环境准备)
6+
- [推理](#推理)
7+
- [微调](#微调)
8+
- [微调后推理](#微调后推理)
9+
10+
11+
## 环境准备
12+
```shell
13+
git clone https://github.com/modelscope/swift.git
14+
cd swift
15+
pip install -e '.[llm]'
16+
```
17+
18+
模型链接:
19+
- glm4v-9b-chat: [https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)
20+
21+
## 推理
22+
23+
推理glm4v-9b-chat:
24+
```shell
25+
# Experimental environment: A100
26+
# 30GB GPU memory
27+
CUDA_VISIBLE_DEVICES=0 swift infer --model_type glm4v-9b-chat
28+
```
29+
30+
输出: (支持传入本地路径或URL)
31+
```python
32+
"""
33+
<<< 描述这张图片
34+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
35+
这是一张特写照片,展示了一只毛茸茸的小猫。小猫的眼睛大而圆,呈深蓝色,眼珠呈金黄色,非常明亮。它的鼻子短而小巧,是粉色的。小猫的嘴巴紧闭,胡须细长。它的耳朵竖立着,耳朵内侧是白色的,外侧是棕色的。小猫的毛发看起来柔软而浓密,主要是白色和棕色相间的条纹图案。背景模糊不清,但似乎是一个室内环境。
36+
--------------------------------------------------
37+
<<< clear
38+
<<< 图中有几只羊
39+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
40+
图中共有四只羊。其中最左边的羊身体较小,后边三只羊体型逐渐变大,且最右边的两只羊体型大小一致。
41+
--------------------------------------------------
42+
<<< clear
43+
<<< 计算结果是多少?
44+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
45+
1452+45304=46756
46+
--------------------------------------------------
47+
<<< clear
48+
<<< 根据图片中的内容写首诗
49+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
50+
湖光山色映小船,
51+
52+
星辉点点伴旅程。
53+
54+
人在画中寻诗意,
55+
56+
心随景迁忘忧愁。
57+
"""
58+
```
59+
60+
示例图片如下:
61+
62+
cat:
63+
64+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
65+
66+
animal:
67+
68+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
69+
70+
math:
71+
72+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
73+
74+
poem:
75+
76+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
77+
78+
**单样本推理**
79+
80+
```python
81+
import os
82+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
83+
84+
from swift.llm import (
85+
get_model_tokenizer, get_template, inference, ModelType,
86+
get_default_template_type, inference_stream
87+
)
88+
from swift.utils import seed_everything
89+
import torch
90+
91+
model_type = ModelType.glm4v_9b_chat
92+
template_type = get_default_template_type(model_type)
93+
print(f'template_type: {template_type}')
94+
95+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
96+
model_kwargs={'device_map': 'auto'})
97+
model.generation_config.max_new_tokens = 256
98+
template = get_template(template_type, tokenizer)
99+
seed_everything(42)
100+
101+
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
102+
query = '距离各城市多远?'
103+
response, history = inference(model, template, query, images=images)
104+
print(f'query: {query}')
105+
print(f'response: {response}')
106+
107+
# 流式
108+
query = '距离最远的城市是哪?'
109+
images = images
110+
gen = inference_stream(model, template, query, history, images=images)
111+
print_idx = 0
112+
print(f'query: {query}\nresponse: ', end='')
113+
for response, _ in gen:
114+
delta = response[print_idx:]
115+
print(delta, end='', flush=True)
116+
print_idx = len(response)
117+
print()
118+
119+
"""
120+
query: 距离各城市多远?
121+
response: 距离马踏还有14Km,距离阳江还有62Km,距离广州还有293Km。
122+
query: 距离最远的城市是哪?
123+
response: 距离最远的城市是广州,有293Km。
124+
"""
125+
```
126+
127+
示例图片如下:
128+
129+
road:
130+
131+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
132+
133+
134+
## 微调
135+
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
136+
137+
(默认对语言和视觉模型的qkv进行lora微调. 如果你想对所有linear都进行微调, 可以指定`--lora_target_modules ALL`)
138+
```shell
139+
# Experimental environment: A100
140+
# 40GB GPU memory
141+
CUDA_VISIBLE_DEVICES=0 swift sft \
142+
--model_type glm4v-9b-chat \
143+
--dataset coco-en-2-mini \
144+
145+
# DDP
146+
NPROC_PER_NODE=2 \
147+
CUDA_VISIBLE_DEVICES=0,1 swift sft \
148+
--model_type glm4v-9b-chat \
149+
--dataset coco-en-2-mini#10000 \
150+
--ddp_find_unused_parameters true \
151+
```
152+
153+
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
154+
155+
(支持多轮对话, 但总的轮次对话只能包含一张图片, 支持传入本地路径或URL)
156+
157+
```jsonl
158+
{"query": "55555", "response": "66666", "images": ["image_path"]}
159+
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
160+
{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
161+
```
162+
163+
164+
## 微调后推理
165+
直接推理:
166+
```shell
167+
CUDA_VISIBLE_DEVICES=0 swift infer \
168+
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx \
169+
--load_dataset_config true \
170+
```
171+
172+
**merge-lora**并推理:
173+
```shell
174+
CUDA_VISIBLE_DEVICES=0 swift export \
175+
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx \
176+
--merge_lora true
177+
178+
CUDA_VISIBLE_DEVICES=0 swift infer \
179+
--ckpt_dir output/glm4v-9b-chat/vx-xxx/checkpoint-xxx-merged \
180+
--load_dataset_config true
181+
```

docs/source/Multi-Modal/minicpm-v最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-3b-chat
3131
输出: (支持传入本地路径或URL)
3232
```python
3333
"""
34-
<<< 描述这种图片
34+
<<< 描述这张图片
3535
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
3636
该图像的特点是一只黑白相间的猫,它的眼睛睁得大大的,似乎在凝视着相机。这只猫看起来很小,可能是一只幼猫。
3737
--------------------------------------------------

docs/source/Multi-Modal/yi-vl最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-vl-6b-chat
2727
输出: (支持传入本地路径或URL)
2828
```python
2929
"""
30-
<<< 描述这种图片
30+
<<< 描述这张图片
3131
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
3232
图片显示一只小猫坐在地板上,眼睛睁开,凝视着摄像机。小猫看起来很可爱,有灰色和白色的毛皮,以及蓝色的眼睛。它似乎正在看摄像机,可能对周围环境很好奇。
3333
--------------------------------------------------

0 commit comments

Comments
 (0)