Skip to content

Commit 78b9a4a

Browse files
committed
Merge commit '6cead7123ef0a35d34fe755bea2e5a61cd275092' into feat/webui_0228
* commit '6cead7123ef0a35d34fe755bea2e5a61cd275092': Fix llama2 generation config (modelscope#468) swift export load model in cpu (modelscope#462) fix get_vllm_engine bug (modelscope#463) Fix llm quantization docs (modelscope#458) Support swift export (modelscope#455)
2 parents 8b80869 + 6cead71 commit 78b9a4a

File tree

232 files changed

+1641
-1239
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

232 files changed

+1641
-1239
lines changed

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
6262

6363

6464
## 🎉 News
65+
- 2024.02.25: Support `swift export` to export models for AWQ quantization and push to ModelScope Hub. For more details, please refer to the document: [LLM Quantization Document](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md).
6566
- 2024.02.22: Support gemma series: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
6667
- 2024.02.16: Support deepseek-math series: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
6768
- 🔥2024.02.05: Support **Qwen1.5** series, To view all supported Qwen1.5 models please check [Model List](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B). The [qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8) fine-tuned scripts are provided.
@@ -71,9 +72,9 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
7172
- 🔥2024.02.01: Support Agent training! Agent training algorithm comes from this [paper](https://arxiv.org/pdf/2309.00986.pdf). We also introduce the [ms-agent](https://www.modelscope.cn/datasets/iic/ms_agent/summary) dataset. Use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_7b_chat/lora/sft.sh) to begin an agent training!
7273
- 🔥2024.02.01: Support SFT loss to DPO training to reduce the repeat generation problem caused by the KL-divergence loss.
7374
- 2024.02.01: Support AdaLoRA and IA3 adapter in SFT.
74-
- 2024.02.01: Support `--merge_lora_and_save` in AnimateDiff training.
75+
- 2024.02.01: Support `--merge_lora` in AnimateDiff training.
7576
- 2024.01.30: Support [internlm-xcomposer2-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat).
76-
- 🔥2024.01.30: Support [ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), just need to specify `--deepspeed_config_path default-zero3`.
77+
- 🔥2024.01.30: Support [ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), just need to specify `--deepspeed default-zero3`.
7778
- 2024.01.29: Support internlm2-math series: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.
7879
- 🔥2024.01.26: Support [yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
7980
- 2024.01.24: Support codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
@@ -154,6 +155,7 @@ Here is a simple introduction of web-ui:
154155
- Rapidly **fine-tune** and perform inference on LLM, and build a Web-UI, see the [LLM Fine-tuning Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md).
155156
- Using **interface** to fine-tuning and perform inference, see the [WEB-UI Documentation](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
156157
- **DPO training** supported, see the [DPO Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md).
158+
- Export fine-tuned models, including: merge-lora, AWQ quantization, and push to ModelScope Hub. For more details, please refer to the [LLM Quantization Documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%A8%A1%E5%9E%8B.md).
157159
- Utilize VLLM for **inference acceleration** and **deployment(OpenAI API)**. Please refer to [VLLM Inference Acceleration and Deployment](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md) for more information.
158160
- View the models and datasets supported by Swift. You can check [supported models and datasets](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
159161
- Expand and customize models, datasets, and dialogue templates in Swift, see [Customization and Expansion](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
@@ -266,6 +268,7 @@ app_ui_main(infer_args)
266268
- SQL: text2sql-en, 🔥sql-create-context-en.
267269
- Text Generation: 🔥advertise-gen-zh, 🔥dureader-robust-zh.
268270
- Classification: cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en.
271+
- AWQ: pileval.
269272
- Other: finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh.
270273
- Multi-Modal:
271274
- Vision: coco-en, 🔥coco-mini-en, coco-mini-en-2, capcha-images.

README_CN.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6060
用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。
6161

6262
## 🎉 新闻
63+
- 2024.02.25: 支持`swift export`, 对模型进行AWQ量化导出, 以及推送ModelScope Hub. 具体可以查看文档: [LLM量化文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md).
6364
- 2024.02.22: 支持gemma系列: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
6465
- 2024.02.16: 支持deepseek-math系列: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
6566
- 🔥2024.02.05: 支持**Qwen1.5**系列模型, 支持的所有Qwen1.5系列模型请查看[模型列表](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B). 提供了[qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8)微调的脚本.
@@ -69,9 +70,9 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6970
- 🔥2024.02.01: 支持Agent训练!Agent训练算法源自这篇[论文](https://arxiv.org/pdf/2309.00986.pdf). 我们也增加了[ms-agent](https://www.modelscope.cn/datasets/iic/ms_agent/summary)这个优质的agent数据集. 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen_7b_chat/lora/sft.sh)开启Agent训练!
7071
- 🔥2024.02.01: 支持在DPO训练中增加SFT loss来减少KL散度loss造成的生成重复问题.
7172
- 2024.02.01: 支持在训练中使用AdaLoRA和IA3两个adapter.
72-
- 2024.02.01: 支持在AnimateDiff训练中使用`--merge_lora_and_save`参数.
73+
- 2024.02.01: 支持在AnimateDiff训练中使用`--merge_lora`参数.
7374
- 2024.01.30: 支持[internlm-xcomposer2-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat).
74-
- 🔥2024.01.30: 支持[ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), 只需要指定`--deepspeed_config_path default-zero3`即可.
75+
- 🔥2024.01.30: 支持[ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), 只需要指定`--deepspeed default-zero3`即可.
7576
- 2024.01.29: 支持internlm2-math系列: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.
7677
- 🔥2024.01.26: 支持[yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
7778
- 2024.01.24: 支持codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
@@ -154,6 +155,7 @@ swift web-ui
154155
- 快速对LLM进行**微调**, 推理并搭建Web-UI, 可以查看[LLM微调文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM微调文档.md).
155156
- 使用**界面**方式进行微调和推理, 可以查看[WEB-UI文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
156157
- 支持**DPO训练**, 可以查看[DPO文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md).
158+
- 对微调的模型进行导出, 包括: merge-lora, AWQ量化, 推送ModelScope Hub, 可以查看[LLM量化文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%A8%A1%E5%9E%8B.md).
157159
- 使用VLLM进行**推理加速****部署(OpenAI API)**. 可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md).
158160
- 查看swift支持的模型和数据集. 可以查看[支持的模型和数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md).
159161
- 对swift中的模型, 数据集, 对话模板进行**拓展**, 可以查看[自定义与拓展](https://github.com/modelscope/swift/blob/main/docs/source/LLM/自定义与拓展.md).
@@ -265,6 +267,7 @@ app_ui_main(infer_args)
265267
- SQL: text2sql-en, 🔥sql-create-context-en.
266268
- 文本生成: 🔥advertise-gen-zh, 🔥dureader-robust-zh.
267269
- 分类: cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en.
270+
- AWQ: pileval.
268271
- 其他: finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh.
269272
- 多模态:
270273
- 视觉: coco-en, 🔥coco-mini-en, coco-mini-en-2, capcha-images.

docs/source/AIGC/AnimateDiff微调推理文档.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ dataloader_num_workers: int = 1 # dataloader workers数量
176176
push_to_hub: bool = False # 是否推送到modelhub
177177
# 'user_name/repo_name' or 'repo_name'
178178
hub_model_id: Optional[str] = None # modelhub id
179-
hub_private_repo: bool = True
179+
hub_private_repo: bool = False
180180
push_hub_strategy: str = field( # 推送策略,推送最后一个还是每个都推送
181181
default='push_best',
182182
metadata={'choices': ['push_last', 'all_checkpoints']})
@@ -244,13 +244,13 @@ sft_type: str = field(
244244
default='lora', metadata={'choices': ['lora', 'full']}) # 训练方式,支持lora和全参数
245245
246246
ckpt_dir: Optional[str] = field(
247-
default=None, metadata={'help': '/path/to/your/vx_xxx/checkpoint-xxx'}) # 训练的输出文件夹
247+
default=None, metadata={'help': '/path/to/your/vx-xxx/checkpoint-xxx'}) # 训练的输出文件夹
248248
eval_human: bool = False # False: eval val_dataset # 是否使用人工输入评测
249249
250250
seed: int = 42 # 随机种子
251251
252-
merge_lora_and_save: bool = False # Merge lora into the MotionAdapter and save the model.
253-
replace_if_exists: bool = False # Replace the files if the output merged dir exists when `merge_lora_and_save` is True.
252+
merge_lora: bool = False # Merge lora into the MotionAdapter and save the model.
253+
replace_if_exists: bool = False # Replace the files if the output merged dir exists when `merge_lora` is True.
254254
255255
# other
256256
ignore_args_error: bool = False # True: notebook compatibility

docs/source/LLM/Agent微调最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
## 环境安装
1212

1313
```bash
14-
# 设置pip全局镜像
14+
# 设置pip全局镜像 (加速下载)
1515
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1616
# 安装ms-swift
1717
git clone https://github.com/modelscope/swift.git

docs/source/LLM/LLM人类对齐训练文档.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
## 环境准备
77
GPU设备: A10, 3090, V100, A100均可,如果是显存<=24G的GPU最少需要双卡环境。由于人类对齐训练在一张卡上加载两个模型,因此比微调的显存多占用一个推理模型的显存使用量。
88
```bash
9-
# 设置pip全局镜像
9+
# 设置pip全局镜像 (加速下载)
1010
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1111
# 安装ms-swift
1212
git clone https://github.com/modelscope/swift.git
@@ -84,7 +84,7 @@ cd examples/pytorch/llm
8484
- 我们默认在训练时设置`--gradient_checkpointing true`**节约显存**, 这会略微降低训练速度.
8585
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
8686
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型)
87-
- 如果你需要断网进行训练, 请使用`--model_cache_dir`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
87+
- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
8888
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
8989

9090
```bash

docs/source/LLM/LLM微调文档.md

Lines changed: 37 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@
44
- [微调](#微调)
55
- [DPO](#dpo)
66
- [Merge LoRA](#merge-lora)
7+
- [量化](#量化)
78
- [推理](#推理)
89
- [Web-UI](#web-ui)
910

1011
## 环境准备
1112
GPU设备: A10, 3090, V100, A100均可.
1213
```bash
13-
# 设置pip全局镜像
14+
# 设置pip全局镜像 (加速下载)
1415
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1516
# 安装ms-swift
1617
git clone https://github.com/modelscope/swift.git
@@ -76,15 +77,13 @@ app_ui_main(infer_args)
7677
```bash
7778
# Experimental environment: A10, 3090, V100, ...
7879
# 20GB GPU memory
79-
CUDA_VISIBLE_DEVICES=0 \
80-
swift sft \
80+
CUDA_VISIBLE_DEVICES=0 swift sft \
8181
--model_id_or_path qwen/Qwen-7B-Chat \
8282
--dataset blossom-math-zh \
8383
--output_dir output \
8484

8585
# 使用自己的数据集
86-
CUDA_VISIBLE_DEVICES=0 \
87-
swift sft \
86+
CUDA_VISIBLE_DEVICES=0 swift sft \
8887
--model_id_or_path qwen/Qwen-7B-Chat \
8988
--custom_train_dataset_path chatml.jsonl \
9089
--output_dir output \
@@ -146,9 +145,9 @@ cd examples/pytorch/llm
146145
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
147146
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型)
148147
- 如果你要进行**二次预训练**, **多轮对话**, 你可以参考[自定义与拓展](./自定义与拓展.md#注册数据集的方式)
149-
- 如果你需要**断网**进行训练, 请使用`--model_cache_dir`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
148+
- 如果你需要**断网**进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](./命令行参数.md).
150149
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
151-
- 如何你想要在推理时, 合并LoRA权重并保存,你需要设置`--merge_lora_and_save true`. **不推荐对qlora训练的模型进行merge**, 这会存在精度损失.
150+
- 如何你想要在推理时, 合并LoRA权重并保存,你需要设置`--merge_lora true`. **不推荐对qlora训练的模型进行merge**, 这会存在精度损失.
152151
- 以下提供了可以直接运行的`qwen_7b_chat`的sh脚本(你只需要在推理时指定`--ckpt_dir`即可顺利执行). 更多模型的scripts脚本, 可以查看[scripts文件夹](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts). 如果你想要**自定义sh脚本**, 推荐你参考`scripts/qwen_7b_chat`中的脚本进行书写.
153152

154153
```bash
@@ -225,9 +224,15 @@ bash scripts/qwen_7b_chat/qlora_ddp_ds/infer.sh
225224
## Merge LoRA
226225
提示: **暂时**不支持bnb和auto_gptq量化模型的merge lora, 这会产生较大的精度损失.
227226
```bash
228-
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
227+
# 如果你需要量化, 可以指定`--quant_bits 4`.
228+
CUDA_VISIBLE_DEVICES=0 swift export \
229+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
229230
```
230231

232+
## 量化
233+
234+
对微调后模型进行量化可以查看[LLM量化文档](LLM量化文档.md#微调后模型)
235+
231236
## 推理
232237
如果你要使用VLLM进行推理加速, 可以查看[VLLM推理加速与部署](./VLLM推理加速与部署.md#微调后的模型)
233238

@@ -251,13 +256,13 @@ from swift.llm import (
251256
)
252257
from swift.tuners import Swift
253258

254-
model_dir = 'vx_xxx/checkpoint-100'
259+
ckpt_dir = 'vx-xxx/checkpoint-100'
255260
model_type = ModelType.qwen_7b_chat
256261
template_type = get_default_template_type(model_type)
257262

258263
model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'})
259264

260-
model = Swift.from_pretrained(model, model_dir, inference_mode=True)
265+
model = Swift.from_pretrained(model, ckpt_dir, inference_mode=True)
261266
template = get_template(template_type, tokenizer)
262267
query = 'xxxxxx'
263268
response, history = inference(model, template, query)
@@ -274,12 +279,12 @@ from swift.llm import (
274279
get_model_tokenizer, get_template, inference, ModelType, get_default_template_type
275280
)
276281

277-
model_dir = 'vx_xxx/checkpoint-100-merged'
282+
ckpt_dir = 'vx-xxx/checkpoint-100-merged'
278283
model_type = ModelType.qwen_7b_chat
279284
template_type = get_default_template_type(model_type)
280285

281286
model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'},
282-
model_dir=model_dir)
287+
model_id_or_path=ckpt_dir)
283288

284289
template = get_template(template_type, tokenizer)
285290
query = 'xxxxxx'
@@ -291,27 +296,30 @@ print(f'history: {history}')
291296
使用**数据集**评估:
292297
```bash
293298
# 直接推理
294-
CUDA_VISIBLE_DEVICES=0 \
295-
swift infer \
296-
--ckpt_dir 'xxx/vx_xxx/checkpoint-xxx' \
299+
CUDA_VISIBLE_DEVICES=0 swift infer \
300+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' \
297301
--load_dataset_config true \
298302

299303
# Merge LoRA增量权重并推理
300-
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
301-
CUDA_VISIBLE_DEVICES=0 \
302-
swift infer \
303-
--ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged' \
304-
--load_dataset_config true \
304+
# 如果你需要量化, 可以指定`--quant_bits 4`.
305+
CUDA_VISIBLE_DEVICES=0 swift export \
306+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
307+
308+
CUDA_VISIBLE_DEVICES=0 swift infer \
309+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' --load_dataset_config true
305310
```
306311

307312
**人工**评估:
308313
```bash
309314
# 直接推理
310-
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
315+
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx'
311316

312317
# Merge LoRA增量权重并推理
313-
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
314-
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged'
318+
# 如果你需要量化, 可以指定`--quant_bits 4`.
319+
CUDA_VISIBLE_DEVICES=0 swift export \
320+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
321+
322+
CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged'
315323
```
316324

317325
## Web-UI
@@ -323,9 +331,12 @@ CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged'
323331
### 微调后模型
324332
```bash
325333
# 直接使用app-ui
326-
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
334+
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx'
327335

328336
# merge LoRA增量权重并使用app-ui
329-
swift merge-lora --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx'
330-
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx_xxx/checkpoint-xxx-merged'
337+
# 如果你需要量化, 可以指定`--quant_bits 4`.
338+
CUDA_VISIBLE_DEVICES=0 swift export \
339+
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true
340+
341+
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged'
331342
```

docs/source/LLM/LLM推理文档.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
## 环境准备
1010
GPU设备: A10, 3090, V100, A100均可.
1111
```bash
12-
# 设置pip全局镜像
12+
# 设置pip全局镜像 (加速下载)
1313
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1414
# 安装ms-swift
1515
git clone https://github.com/modelscope/swift.git

0 commit comments

Comments
 (0)