Skip to content

Commit 4e262e5

Browse files
authored
update multi-modal docs (#538)
1 parent f60e2ce commit 4e262e5

10 files changed

+337
-4
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,8 +222,8 @@ You can refer to the following scripts to customize your own training script.
222222
- Multi-Modal:
223223
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) series: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
224224
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) series: qwen-audio, qwen-audio-chat.
225-
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.
226225
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) series: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
226+
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.
227227
- [yi-vl](https://github.com/01-ai/Yi) series: yi-vl-6b-chat, yi-vl-34b-chat.
228228
- [cogvlm](https://github.com/THUDM/CogVLM) series: cogvlm-17b-instruct, cogagent-18b-chat, cogagent-18b-instruct.
229229
- General:

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,8 +222,8 @@ app_ui_main(infer_args)
222222
- 多模态:
223223
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) 系列: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
224224
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) 系列: qwen-audio, qwen-audio-chat.
225-
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.
226225
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) 系列: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
226+
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.
227227
- [yi-vl](https://github.com/01-ai/Yi) 系列: yi-vl-6b-chat, yi-vl-34b-chat.
228228
- [cogvlm](https://github.com/THUDM/CogVLM) 系列: cogvlm-17b-instruct, cogagent-18b-chat, cogagent-18b-instruct.
229229
- 通用:

docs/source/LLM/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@
1111

1212
1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
1313
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
14-
3. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
15-
4. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
14+
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
15+
4. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
1616
5. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
1717
6. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)
1818

docs/source/Multi-Modal/cogvlm最佳实践.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,60 @@ poem:
6868

6969
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
7070

71+
**单样本推理**
72+
73+
```python
74+
import os
75+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
76+
77+
from swift.llm import (
78+
get_model_tokenizer, get_template, inference, ModelType,
79+
get_default_template_type, inference_stream
80+
)
81+
from swift.utils import seed_everything
82+
import torch
83+
84+
model_type = ModelType.cogvlm_17b_instruct
85+
template_type = get_default_template_type(model_type)
86+
print(f'template_type: {template_type}')
87+
88+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
89+
model_kwargs={'device_map': 'auto'})
90+
model.generation_config.max_new_tokens = 256
91+
template = get_template(template_type, tokenizer)
92+
seed_everything(42)
93+
94+
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
95+
query = 'How far is it from each city?'
96+
response, _ = inference(model, template, query, images=images)
97+
print(f'query: {query}')
98+
print(f'response: {response}')
99+
100+
# 流式
101+
query = 'Which city is the farthest?'
102+
images = images
103+
gen = inference_stream(model, template, query, images=images)
104+
print_idx = 0
105+
print(f'query: {query}\nresponse: ', end='')
106+
for response, _ in gen:
107+
delta = response[print_idx:]
108+
print(delta, end='', flush=True)
109+
print_idx = len(response)
110+
print()
111+
"""
112+
query: How far is it from each city?
113+
response: From Mata, it is 14 km; from Yangjiang, it is 62 km; and from Guangzhou, it is 293 km.
114+
query: Which city is the farthest?
115+
response: The city 'Mata' is the farthest with a distance of 14 km.
116+
"""
117+
```
118+
119+
示例图片如下:
120+
121+
road:
122+
123+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
124+
71125

72126
## 微调
73127
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

docs/source/Multi-Modal/deepseek-vl最佳实践.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,63 @@ poem:
7676
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
7777

7878

79+
**单样本推理**
80+
81+
```python
82+
import os
83+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
84+
85+
from swift.llm import (
86+
get_model_tokenizer, get_template, inference, ModelType,
87+
get_default_template_type, inference_stream
88+
)
89+
from swift.utils import seed_everything
90+
import torch
91+
92+
model_type = ModelType.deepseek_vl_7b_chat
93+
template_type = get_default_template_type(model_type)
94+
print(f'template_type: {template_type}')
95+
96+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
97+
model_kwargs={'device_map': 'auto'})
98+
model.generation_config.max_new_tokens = 256
99+
template = get_template(template_type, tokenizer)
100+
seed_everything(42)
101+
102+
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
103+
query = '距离各城市多远?'
104+
response, history = inference(model, template, query, images=images)
105+
print(f'query: {query}')
106+
print(f'response: {response}')
107+
108+
query = '距离最远的城市是哪?'
109+
images = images * 2
110+
response, history = inference(model, template, query, history, images=images)
111+
print(f'query: {query}')
112+
print(f'response: {response}')
113+
print(f'history: {history}')
114+
"""
115+
query: 距离各城市多远?
116+
response: 这个标志显示了从当前位置到以下城市的距离:
117+
118+
- 马塔(Mata):14公里
119+
- 阳江(Yangjiang):62公里
120+
- 广州(Guangzhou):293公里
121+
122+
这些信息是根据图片中的标志提供的。
123+
query: 距离最远的城市是哪?
124+
response: 距离最远的那个城市是广州,根据标志所示,从当前位置到广州的距离是293公里。
125+
history: [('距离各城市多远?', '这个标志显示了从当前位置到以下城市的距离:\n\n- 马塔(Mata):14公里\n- 阳江(Yangjiang):62公里\n- 广州(Guangzhou):293公里\n\n这些信息是根据图片中的标志提供的。'), ('距离最远的城市是哪?', '距离最远的那个城市是广州,根据标志所示,从当前位置到广州的距离是293公里。')]
126+
"""
127+
```
128+
129+
示例图片如下:
130+
131+
road:
132+
133+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
134+
135+
79136
## 微调
80137
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
81138

docs/source/Multi-Modal/index.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## Multi-Modal文档
2+
3+
### Multi-Modal最佳实践系列
4+
5+
1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
6+
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
7+
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
8+
4. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
9+
5. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
10+
6. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)

docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,56 @@ poem:
7070
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
7171

7272

73+
**单样本推理**
74+
75+
```python
76+
import os
77+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
78+
79+
from swift.llm import (
80+
get_model_tokenizer, get_template, inference, ModelType,
81+
get_default_template_type, inference_stream
82+
)
83+
from swift.utils import seed_everything
84+
import torch
85+
86+
model_type = ModelType.internlm_xcomposer2_7b_chat
87+
template_type = get_default_template_type(model_type)
88+
print(f'template_type: {template_type}')
89+
90+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
91+
model_kwargs={'device_map': 'auto'})
92+
model.generation_config.max_new_tokens = 256
93+
template = get_template(template_type, tokenizer)
94+
seed_everything(42)
95+
96+
query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?"""
97+
response, history = inference(model, template, query)
98+
print(f'query: {query}')
99+
print(f'response: {response}')
100+
101+
query = '距离最远的城市是哪?'
102+
response, history = inference(model, template, query, history)
103+
print(f'query: {query}')
104+
print(f'response: {response}')
105+
print(f'history: {history}')
106+
"""
107+
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?
108+
response: 马鞍山距离阳江62公里,广州距离广州293公里。
109+
query: 距离最远的城市是哪?
110+
response: 最远的距离是地球的两极,南极和北极。
111+
history: [('<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?', ' 马鞍山距离阳江62公里,广州距离广州293公里。'), ('距离最远的城市是哪?', ' 最远的距离是地球的两极,南极和北极。')]
112+
"""
113+
"""
114+
```
115+
116+
示例图片如下:
117+
118+
road:
119+
120+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
121+
122+
73123
## 微调
74124
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
75125

docs/source/Multi-Modal/qwen-audio最佳实践.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,56 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-audio-chat
4343
"""
4444
```
4545

46+
**单样本推理**
47+
48+
```python
49+
import os
50+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
51+
52+
from swift.llm import (
53+
get_model_tokenizer, get_template, inference, ModelType,
54+
get_default_template_type, inference_stream
55+
)
56+
from swift.utils import seed_everything
57+
import torch
58+
59+
model_type = ModelType.qwen_audio_chat
60+
template_type = get_default_template_type(model_type)
61+
print(f'template_type: {template_type}')
62+
63+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
64+
model_kwargs={'device_map': 'auto'})
65+
model.generation_config.max_new_tokens = 256
66+
template = get_template(template_type, tokenizer)
67+
seed_everything(42)
68+
69+
query = """Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
70+
这段语音说了什么"""
71+
response, history = inference(model, template, query)
72+
print(f'query: {query}')
73+
print(f'response: {response}')
74+
75+
# 流式
76+
query = '这段语音是男生还是女生'
77+
gen = inference_stream(model, template, query, history)
78+
print_idx = 0
79+
print(f'query: {query}\nresponse: ', end='')
80+
for response, history in gen:
81+
delta = response[print_idx:]
82+
print(delta, end='', flush=True)
83+
print_idx = len(response)
84+
print()
85+
print(f'history: {history}')
86+
"""
87+
query: Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
88+
这段语音说了什么
89+
response: 这段语音说了中文:"今天天气真好呀"。
90+
query: 这段语音是男生还是女生
91+
response: 根据音色判断,这段语音是男性。
92+
history: [('Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>\n这段语音说了什么', '这段语音说了中文:"今天天气真好呀"。'), ('这段语音是男生还是女生', '根据音色判断,这段语音是男性。')]
93+
"""
94+
```
95+
4696

4797
## 微调
4898
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

docs/source/Multi-Modal/qwen-vl最佳实践.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,62 @@ poem:
6969

7070
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
7171

72+
**单样本推理**
73+
74+
```python
75+
import os
76+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
77+
78+
from swift.llm import (
79+
get_model_tokenizer, get_template, inference, ModelType,
80+
get_default_template_type, inference_stream
81+
)
82+
from swift.utils import seed_everything
83+
import torch
84+
85+
model_type = ModelType.qwen_vl_chat
86+
template_type = get_default_template_type(model_type)
87+
print(f'template_type: {template_type}')
88+
89+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
90+
model_kwargs={'device_map': 'auto'})
91+
model.generation_config.max_new_tokens = 256
92+
template = get_template(template_type, tokenizer)
93+
seed_everything(42)
94+
95+
query = """Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
96+
距离各城市多远?"""
97+
response, history = inference(model, template, query)
98+
print(f'query: {query}')
99+
print(f'response: {response}')
100+
101+
# 流式
102+
query = '距离最远的城市是哪?'
103+
gen = inference_stream(model, template, query, history)
104+
print_idx = 0
105+
print(f'query: {query}\nresponse: ', end='')
106+
for response, history in gen:
107+
delta = response[print_idx:]
108+
print(delta, end='', flush=True)
109+
print_idx = len(response)
110+
print()
111+
print(f'history: {history}')
112+
"""
113+
query: Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
114+
距离各城市多远?
115+
response: 马路边距离马路边14公里;阳江边距离马路边62公里;广州边距离马路边293公里。
116+
query: 距离最远的城市是哪?
117+
response: 距离最远的城市是广州,距离马路边293公里。
118+
history: [('Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>\n距离各城市多远?', '马路边距离马路边14公里;阳江边距离马路边62公里;广州边距离马路边293公里。'), ('距离最远的城市是哪?', '距离最远的城市是广州,距离马路边293公里。')]
119+
"""
120+
```
121+
122+
示例图片如下:
123+
124+
road:
125+
126+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
127+
72128

73129
## 微调
74130
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

0 commit comments

Comments
 (0)