Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,8 @@ You can refer to the following scripts to customize your own training script.
- Multi-Modal:
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) series: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) series: qwen-audio, qwen-audio-chat.
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) series: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.
- [yi-vl](https://github.com/01-ai/Yi) series: yi-vl-6b-chat, yi-vl-34b-chat.
- [cogvlm](https://github.com/THUDM/CogVLM) series: cogvlm-17b-instruct, cogagent-18b-chat, cogagent-18b-instruct.
- General:
Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,8 @@ app_ui_main(infer_args)
- 多模态:
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) 系列: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) 系列: qwen-audio, qwen-audio-chat.
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) 系列: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.
- [yi-vl](https://github.com/01-ai/Yi) 系列: yi-vl-6b-chat, yi-vl-34b-chat.
- [cogvlm](https://github.com/THUDM/CogVLM) 系列: cogvlm-17b-instruct, cogagent-18b-chat, cogagent-18b-instruct.
- 通用:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@

1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
3. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
4. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
4. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
5. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
6. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)

Expand Down
54 changes: 54 additions & 0 deletions docs/source/Multi-Modal/cogvlm最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,60 @@ poem:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">

**单样本推理**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.cogvlm_17b_instruct
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, _ = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = 'Which city is the farthest?'
images = images
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: How far is it from each city?
response: From Mata, it is 14 km; from Yangjiang, it is 62 km; and from Guangzhou, it is 293 km.
query: Which city is the farthest?
response: The city 'Mata' is the farthest with a distance of 14 km.
"""
```

示例图片如下:

road:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">


## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
Expand Down
57 changes: 57 additions & 0 deletions docs/source/Multi-Modal/deepseek-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,63 @@ poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">


**单样本推理**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.deepseek_vl_7b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = '距离各城市多远?'
response, history = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')

query = '距离最远的城市是哪?'
images = images * 2
response, history = inference(model, template, query, history, images=images)
print(f'query: {query}')
print(f'response: {response}')
print(f'history: {history}')
"""
query: 距离各城市多远?
response: 这个标志显示了从当前位置到以下城市的距离:

- 马塔(Mata):14公里
- 阳江(Yangjiang):62公里
- 广州(Guangzhou):293公里

这些信息是根据图片中的标志提供的。
query: 距离最远的城市是哪?
response: 距离最远的那个城市是广州,根据标志所示,从当前位置到广州的距离是293公里。
history: [('距离各城市多远?', '这个标志显示了从当前位置到以下城市的距离:\n\n- 马塔(Mata):14公里\n- 阳江(Yangjiang):62公里\n- 广州(Guangzhou):293公里\n\n这些信息是根据图片中的标志提供的。'), ('距离最远的城市是哪?', '距离最远的那个城市是广州,根据标志所示,从当前位置到广州的距离是293公里。')]
"""
```

示例图片如下:

road:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">


## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

Expand Down
10 changes: 10 additions & 0 deletions docs/source/Multi-Modal/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
## Multi-Modal文档

### Multi-Modal最佳实践系列

1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
4. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
5. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
6. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)
50 changes: 50 additions & 0 deletions docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,56 @@ poem:
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">


**单样本推理**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.internlm_xcomposer2_7b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

query = '距离最远的城市是哪?'
response, history = inference(model, template, query, history)
print(f'query: {query}')
print(f'response: {response}')
print(f'history: {history}')
"""
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?
response: 马鞍山距离阳江62公里,广州距离广州293公里。
query: 距离最远的城市是哪?
response: 最远的距离是地球的两极,南极和北极。
history: [('<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?', ' 马鞍山距离阳江62公里,广州距离广州293公里。'), ('距离最远的城市是哪?', ' 最远的距离是地球的两极,南极和北极。')]
"""
"""
```

示例图片如下:

road:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">


## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

Expand Down
50 changes: 50 additions & 0 deletions docs/source/Multi-Modal/qwen-audio最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,56 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen-audio-chat
"""
```

**单样本推理**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen_audio_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = """Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
这段语音说了什么"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '这段语音是男生还是女生'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>
这段语音说了什么
response: 这段语音说了中文:"今天天气真好呀"。
query: 这段语音是男生还是女生
response: 根据音色判断,这段语音是男性。
history: [('Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav</audio>\n这段语音说了什么', '这段语音说了中文:"今天天气真好呀"。'), ('这段语音是男生还是女生', '根据音色判断,这段语音是男性。')]
"""
```


## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
Expand Down
56 changes: 56 additions & 0 deletions docs/source/Multi-Modal/qwen-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,62 @@ poem:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">

**单样本推理**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen_vl_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = """Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
距离各城市多远?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '距离最远的城市是哪?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}')
"""
query: Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>
距离各城市多远?
response: 马路边距离马路边14公里;阳江边距离马路边62公里;广州边距离马路边293公里。
query: 距离最远的城市是哪?
response: 距离最远的城市是广州,距离马路边293公里。
history: [('Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>\n距离各城市多远?', '马路边距离马路边14公里;阳江边距离马路边62公里;广州边距离马路边293公里。'), ('距离最远的城市是哪?', '距离最远的城市是广州,距离马路边293公里。')]
"""
```

示例图片如下:

road:

<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">


## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
Expand Down
Loading