Skip to content

Commit 3f4b157

Browse files
authored
Support llava (#577)
1 parent 5da1d74 commit 3f4b157

File tree

12 files changed

+351
-30
lines changed

12 files changed

+351
-30
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
6464

6565

6666
## 🎉 News
67+
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
6768
- 🔥2024.03.12: Supports inference and fine-tuning for the **deepseek-vl** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
6869
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507), which can efficiently reduce the memory usage(almost half of the original memory) when training the full model.
6970
- 🔥2024.03.10: For the end-to-end best practice of fine-tuning to deployment of Qwen1.5-7B-Chat and Qwen1.5-72B-Chat, you can refer to the [Qwen1.5 Full Workflow Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md).
@@ -222,6 +223,7 @@ You can refer to the following scripts to customize your own training script.
222223
- Multi-Modal:
223224
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) series: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
224225
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) series: qwen-audio, qwen-audio-chat.
226+
- [llava](https://github.com/haotian-liu/LLaVA) seires: llava1d6-mistral-7b-chat.
225227
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) series: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
226228
- [yi-vl](https://github.com/01-ai/Yi) series: yi-vl-6b-chat, yi-vl-34b-chat.
227229
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.

README_CN.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6262
用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。
6363

6464
## 🎉 新闻
65+
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
6566
- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
6667
- 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.
6768
- 🔥2024.03.10: Qwen1.5-7B-Chat与Qwen1.5-72B-Chat从微调到部署[全流程最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md).
@@ -222,6 +223,7 @@ app_ui_main(infer_args)
222223
- 多模态:
223224
- [qwen-vl](https://github.com/QwenLM/Qwen-VL) 系列: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
224225
- [qwen-audio](https://github.com/QwenLM/Qwen-Audio) 系列: qwen-audio, qwen-audio-chat.
226+
- [llava](https://github.com/haotian-liu/LLaVA) 系列: llava1d6-mistral-7b-chat.
225227
- [deepseek-vl](https://github.com/deepseek-ai/DeepSeek-VL) 系列: deepseek-vl-1_3b-chat, deepseek-vl-7b-chat.
226228
- [yi-vl](https://github.com/01-ai/Yi) 系列: yi-vl-6b-chat, yi-vl-34b-chat.
227229
- [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.

docs/source/LLM/index.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,7 @@
88

99

1010
### Multi-Modal最佳实践系列
11-
12-
1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
13-
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
14-
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
15-
4. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
16-
5. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
17-
6. [MiniCPM-V最佳实践](../Multi-Modal/minicpm-v最佳实践.md)
18-
7. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)
11+
查看这里: [Multi-Modal最佳实践系列](../Multi-Modal/index.md)
1912

2013

2114
### 教程

docs/source/LLM/支持的模型和数据集.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,15 +78,16 @@
7878
|llama2-70b|[modelscope/Llama-2-70b-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✔||-|
7979
|llama2-70b-chat|[modelscope/Llama-2-70b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|✔|✔||-|
8080
|llama2-7b-aqlm-2bit-1x16|[AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|✔|✘|transformers>=4.38, aqlm, torch>=2.2.0|-|
81+
|llava1d6-mistral-7b-chat|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral|✔|✘|transformers>=4.34|multi-modal, vision|
8182
|yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔||-|
8283
|yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔||-|
8384
|yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|✔|✔||-|
8485
|yi-9b|[01ai/Yi-9B](https://modelscope.cn/models/01ai/Yi-9B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔||-|
8586
|yi-34b|[01ai/Yi-34B](https://modelscope.cn/models/01ai/Yi-34B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔||-|
8687
|yi-34b-200k|[01ai/Yi-34B-200K](https://modelscope.cn/models/01ai/Yi-34B-200K/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔||-|
8788
|yi-34b-chat|[01ai/Yi-34B-Chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)|q_proj, k_proj, v_proj|yi|✔|✔||-|
88-
|yi-vl-6b-chat|[01ai/Yi-VL-6B](https://modelscope.cn/models/01ai/Yi-VL-6B/summary)|q_proj, k_proj, v_proj|yi-vl|✘|✘|transformers>=4.34|multi-modal, vision|
89-
|yi-vl-34b-chat|[01ai/Yi-VL-34B](https://modelscope.cn/models/01ai/Yi-VL-34B/summary)|q_proj, k_proj, v_proj|yi-vl|✘|✘|transformers>=4.34|multi-modal, vision|
89+
|yi-vl-6b-chat|[01ai/Yi-VL-6B](https://modelscope.cn/models/01ai/Yi-VL-6B/summary)|q_proj, k_proj, v_proj|yi-vl|✔|✘|transformers>=4.34|multi-modal, vision|
90+
|yi-vl-34b-chat|[01ai/Yi-VL-34B](https://modelscope.cn/models/01ai/Yi-VL-34B/summary)|q_proj, k_proj, v_proj|yi-vl|✔|✘|transformers>=4.34|multi-modal, vision|
9091
|internlm-7b|[Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)|q_proj, k_proj, v_proj|default-generation-bos|✘|✔||-|
9192
|internlm-7b-chat|[Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||-|
9293
|internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|✘|✔||-|

docs/source/Multi-Modal/index.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22

33
### Multi-Modal最佳实践系列
44

5-
1. [Qwen-VL最佳实践](../Multi-Modal/qwen-vl最佳实践.md)
6-
2. [Qwen-Audio最佳实践](../Multi-Modal/qwen-auidio最佳实践.md)
7-
3. [Deepseek-VL最佳实践](../Multi-Modal/deepseek-vl最佳实践.md)
8-
4. [Yi-VL最佳实践.md](../Multi-Modal/yi-vl最佳实践.md)
9-
5. [Internlm2-Xcomposers最佳实践](../Multi-Modal/internlm-xcomposer2最佳实践.md)
10-
6. [MiniCPM-V最佳实践](../Multi-Modal/minicpm-v最佳实践.md)
11-
7. [CogVLM最佳实践](../Multi-Modal/cogvlm最佳实践.md)
5+
1. [Qwen-VL最佳实践](qwen-vl最佳实践.md)
6+
2. [Qwen-Audio最佳实践](qwen-auidio最佳实践.md)
7+
3. [Llava最佳实践](llava最佳实践.md)
8+
4. [Deepseek-VL最佳实践](deepseek-vl最佳实践.md)
9+
5. [Yi-VL最佳实践.md](yi-vl最佳实践.md)
10+
6. [Internlm2-Xcomposers最佳实践](internlm-xcomposer2最佳实践.md)
11+
7. [MiniCPM-V最佳实践](minicpm-v最佳实践.md)
12+
8. [CogVLM最佳实践](cogvlm最佳实践.md)
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
2+
# Llava 最佳实践
3+
4+
## 目录
5+
- [环境准备](#环境准备)
6+
- [推理](#推理)
7+
- [微调](#微调)
8+
- [微调后推理](#微调后推理)
9+
10+
11+
## 环境准备
12+
```shell
13+
git clone https://github.com/modelscope/swift.git
14+
cd swift
15+
pip install -e .[llm]
16+
```
17+
18+
## 推理
19+
20+
推理[llava1d6-mistral-7b-chat](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary):
21+
```shell
22+
# Experimental environment: A10, 3090, V100...
23+
# 20GB GPU memory
24+
CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1d6-mistral-7b-chat
25+
```
26+
27+
输出: (支持传入本地路径或URL)
28+
```python
29+
"""
30+
<<< Describe this image.
31+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
32+
The image shows a close-up of a kitten with a soft, blurred background that suggests a natural, outdoor setting. The kitten has a mix of white and gray fur with darker stripes, typical of a tabby pattern. Its eyes are wide open, with a striking blue color that contrasts with the kitten's fur. The kitten's nose is small and pink, and its whiskers are long and white, adding to the kitten's cute and innocent appearance. The lighting in the image is soft and diffused, creating a gentle and warm atmosphere. The focus is sharp on the kitten's face, while the rest of the image is slightly out of focus, which draws attention to the kitten's features.
33+
--------------------------------------------------
34+
<<< How many sheep are in the picture?
35+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
36+
There are four sheep in the picture.
37+
--------------------------------------------------
38+
<<< What is the calculation result?
39+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
40+
The calculation result is 14352 + 45304 = 145304.
41+
--------------------------------------------------
42+
<<< Write a poem based on the content of the picture.
43+
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
44+
In the quiet of the night,
45+
A solitary boat takes flight,
46+
Across the water's gentle swell,
47+
Underneath the stars that softly fell.
48+
49+
The boat, a vessel of the night,
50+
Carries but one, a lone delight,
51+
A solitary figure, lost in thought,
52+
In the tranquil calm, they find a wraith.
53+
54+
The stars above, like diamonds bright,
55+
Reflect upon the water's surface light,
56+
Creating a path for the boat's journey,
57+
Guiding through the night with a gentle purity.
58+
59+
The boat, a silent sentinel,
60+
In the stillness, it gently swells,
61+
A vessel of peace and calm,
62+
In the quiet of the night, it carries on.
63+
64+
The figure on board, a soul at ease,
65+
In the serene embrace of nature's peace,
66+
They sail through the night,
67+
Under the watchful eyes of the stars' light.
68+
69+
The boat, a symbol of solitude,
70+
In the vast expanse of the universe's beauty,
71+
A lone journey, a solitary quest,
72+
In the quiet of the night, it finds its rest.
73+
"""
74+
```
75+
76+
示例图片如下:
77+
78+
cat:
79+
80+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png" width="250" style="display: inline-block;">
81+
82+
animal:
83+
84+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png" width="250" style="display: inline-block;">
85+
86+
math:
87+
88+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png" width="250" style="display: inline-block;">
89+
90+
poem:
91+
92+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png" width="250" style="display: inline-block;">
93+
94+
**单样本推理**
95+
96+
```python
97+
import os
98+
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
99+
100+
from swift.llm import (
101+
get_model_tokenizer, get_template, inference, ModelType,
102+
get_default_template_type, inference_stream
103+
)
104+
from swift.utils import seed_everything
105+
import torch
106+
107+
model_type = ModelType.llava1d6_mistral_7b_chat
108+
template_type = get_default_template_type(model_type)
109+
print(f'template_type: {template_type}')
110+
111+
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
112+
model_kwargs={'device_map': 'auto'})
113+
model.generation_config.max_new_tokens = 256
114+
template = get_template(template_type, tokenizer)
115+
seed_everything(42)
116+
117+
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
118+
query = 'How far is it from each city?'
119+
response, _ = inference(model, template, query, images=images)
120+
print(f'query: {query}')
121+
print(f'response: {response}')
122+
123+
# 流式
124+
query = 'Which city is the farthest?'
125+
gen = inference_stream(model, template, query, images=images)
126+
print_idx = 0
127+
print(f'query: {query}\nresponse: ', end='')
128+
for response, _ in gen:
129+
delta = response[print_idx:]
130+
print(delta, end='', flush=True)
131+
print_idx = len(response)
132+
print()
133+
"""
134+
query: How far is it from each city?
135+
response: The image shows a road sign indicating the distances to three cities: Mata, Yangjiang, and Guangzhou. The distances are given in kilometers.
136+
137+
- Mata is 14 kilometers away.
138+
- Yangjiang is 62 kilometers away.
139+
- Guangzhou is 293 kilometers away.
140+
141+
Please note that these distances are as the crow flies and do not take into account the actual driving distance due to road conditions, traffic, or other factors.
142+
query: Which city is the farthest?
143+
response: The farthest city listed on the sign is Mata, which is 14 kilometers away.
144+
"""
145+
```
146+
147+
示例图片如下:
148+
149+
road:
150+
151+
<img src="http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png" width="250" style="display: inline-block;">
152+
153+
154+
## 微调
155+
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
156+
157+
LoRA微调:
158+
159+
(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`.)
160+
```shell
161+
# Experimental environment: A10, 3090, V100...
162+
# 21GB GPU memory
163+
CUDA_VISIBLE_DEVICES=0 swift sft \
164+
--model_type llava1d6-mistral-7b-chat \
165+
--dataset coco-mini-en-2 \
166+
```
167+
168+
全参数微调:
169+
```shell
170+
# Experimental environment: 4 * A100
171+
# 4 * 70 GPU memory
172+
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
173+
--model_type llava1d6-mistral-7b-chat \
174+
--dataset coco-mini-en-2 \
175+
--train_dataset_sample -1 \
176+
--sft_type full \
177+
--deepspeed default-zero2
178+
```
179+
180+
181+
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
182+
183+
(只支持单轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)
184+
185+
```jsonl
186+
{"query": "55555", "response": "66666", "images": ["image_path"]}
187+
{"query": "eeeee", "response": "fffff", "images": ["image_path"]}
188+
{"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
189+
```
190+
191+
192+
## 微调后推理
193+
直接推理:
194+
```shell
195+
CUDA_VISIBLE_DEVICES=0 swift infer \
196+
--ckpt_dir output/llava1d6-mistral-7b-chat/vx-xxx/checkpoint-xxx \
197+
--load_dataset_config true \
198+
```
199+
200+
**merge-lora**并推理:
201+
```shell
202+
CUDA_VISIBLE_DEVICES=0 swift export \
203+
--ckpt_dir output/llava1d6-mistral-7b-chat/vx-xxx/checkpoint-xxx \
204+
--merge_lora true
205+
206+
CUDA_VISIBLE_DEVICES=0 swift infer \
207+
--ckpt_dir output/llava1d6-mistral-7b-chat/vx-xxx/checkpoint-xxx-merged \
208+
--load_dataset_config true
209+
```

docs/source/Multi-Modal/qwen-audio最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ CUDA_VISIBLE_DEVICES=0,1 swift sft \
121121

122122
# ZeRO2
123123
# Experimental environment: 4 * A100
124-
# 2 * 80 GPU memory
124+
# 4 * 80 GPU memory
125125
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
126126
--model_type qwen-audio-chat \
127127
--dataset aishell1-mini-zh \

docs/source/Multi-Modal/qwen-vl最佳实践.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,10 @@ Picture 2:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.pn
4545
计算结果是多少#
4646
1452 + 45304 = 46756
4747
--------------------------------------------------
48+
<<< clear
4849
<<<[M] Picture 1:<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png</img>
4950
根据图片中的内容写首诗#
50-
湖面星光点点闪,孤舟独影静如眠。男子举灯照山谷,小猫陪伴在身边
51+
月光如水船如星,独坐船头吹夜风。深林倒影照水面,萤火点点照船行
5152
"""
5253
```
5354

@@ -142,9 +143,9 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
142143

143144
全参数微调:
144145
```shell
145-
# Experimental environment: 2 * A100
146-
# 2 * 55 GPU memory
147-
CUDA_VISIBLE_DEVICES=0,1 swift sft \
146+
# Experimental environment: 4 * A100
147+
# 4 * 70 GPU memory
148+
NPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
148149
--model_type qwen-vl-chat \
149150
--dataset coco-mini-en \
150151
--train_dataset_sample -1 \

swift/llm/infer.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -478,6 +478,10 @@ def llm_infer(args: InferArguments) -> None:
478478
print('-' * 50)
479479
if args.save_result and args.ckpt_dir is not None:
480480
logger.info(f'save_result_path: {jsonl_path}')
481+
if args.val_dataset_sample == 10: # is default
482+
logger.info(
483+
'You can set `--val_dataset_sample -1` to perform inference on the entire dataset.'
484+
)
481485
return {'result': result}
482486

483487

swift/llm/utils/argument.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -407,7 +407,10 @@ def __post_init__(self) -> None:
407407
self.max_length = None
408408

409409
if self.deepspeed is not None:
410-
assert not is_mp(), 'DeepSpeed is not compatible with MP.'
410+
if is_mp():
411+
raise ValueError('DeepSpeed is not compatible with MP. '
412+
f'n_gpu: {torch.cuda.device_count()}, '
413+
f'local_world_size: {get_dist_setting()[3]}.')
411414
require_version('deepspeed')
412415
if self.deepspeed.endswith('.json') or os.path.isfile(
413416
self.deepspeed):

0 commit comments

Comments
 (0)