modelscope · hjh0119 · Jun 1, 2024 · May 31, 2024 · May 31, 2024 · May 31, 2024
diff --git a/README.md b/README.md
@@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
 SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
 
 ## 🎉 News
+- 🔥2024.05.31: Supoprts **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
 - 2024.05.24: Supports Phi3-vision model, Use model_type `phi3-vision-128k-instruct` to train.
 - 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
 - 2024.05.22: Supports TeleChat-12B-v2 model with quantized version, model_type are `telechat-12b-v2` and `telechat-12b-v2-gptq-int4`
@@ -233,7 +234,7 @@ You can refer to the following scripts to customize your own training script.
 |------------------|-------------------------------------------------------------------------------|
 | Pretraining      | Text Generation                                                               |
 | Fine-tuning      | Single-turn/Multi-turn<br>Agent Training/Self-cognition<br>Multi-modal Vision/Multi-modal Speech|
-| Human Alignment  | DPO<br>ORPO                                                                   |
+| Human Alignment  | DPO<br>ORPO<br>SimPO                                                          |
 | Text-to-Image    | DreamBooth, etc.                                                              |
 | Text-to-Video    | -                                                                             |
 
@@ -613,8 +614,6 @@ make docs
 | [LLM Evaluation](docs/source_en/LLM/LLM-eval.md)     |
 | [LLM Quantization](docs/source_en/LLM/LLM-quantization.md)   |
 | [LLM Deployment](docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md) |
-| [DPO Human Alignment Training](docs/source_en/LLM/DPO.md)   |
-| [ORPO Human Alignment Training](docs/source_en/LLM/ORPO.md)   |
 | [AnimateDiff Training](docs/source_en/AIGC/AnimateDiff-train-infer.md) |
 
 ### Reference Documentation
@@ -635,6 +634,9 @@ make docs
 |  [Qwen1.5 Best Practice](docs/source_en/LLM/Qwen1.5-best-practice.md) |
 |  [Multi-Modal Model Training Best Practice](docs/source_en/Multi-Modal/index.md) |
 |  [NPU Best Practice](docs/source_en/LLM/NPU-best-practice.md) |
+| [DPO Human Alignment Training](docs/source_en/LLM/DPO.md)   |
+| [ORPO Human Alignment Training](docs/source_en/LLM/ORPO.md)   |
+| [SimPO Human Alignment Training](docs/source_en/LLM/SimPO.md)   |
 
 
 ### Deep Learning Tutorials

diff --git a/README_CN.md b/README_CN.md
@@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系，如有使用问题请请查看[这里](https:
 可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
 
 ## 🎉 新闻
+- 🔥2024.05.31: 支持**SimPO**训练，使用`swift simpo`来开始训练， 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
 - 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.
 - 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite`和`deekseek-v2-lite-chat`
 - 2024.05.22: 支持TeleChat-12b-v2模型和量化版本, model_type为 `telechat-12b-v2`和`telechat-12b-v2-gptq-int4`
@@ -59,7 +60,7 @@ SWIFT具有丰富的文档体系，如有使用问题请请查看[这里](https:
 - 2024.05.11: 支持使用[hqq](https://github.com/mobiusml/hqq)和[eetq](https://github.com/NetEase-FuXi/EETQ)进行qlora训练和量化推理，可以查看[LLM量化文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/LLM量化文档.md)
 - 2024.05.10: 支持序列并行. 先安装`pip install .[seq_parallel]`, 之后在DDP环境中添加`--sequence_parallel_size n`即可使用!
 - 2024.05.08: 支持DeepSeek-V2-Chat模型, 训练参考[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_ddp_ds3/sft.sh)。支持InternVL-Chat-V1.5-Int8模型，最佳实践参考[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/internvl最佳实践.md).
-- 🔥2024.05.07: 支持**ORPO**训练，使用`swift orpo`来开始使用， 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/ORPO算法最佳实践.md)
+- 🔥2024.05.07: 支持**ORPO**训练，使用`swift orpo`来开始训练， 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/ORPO算法最佳实践.md)
 - 2024.05.07: 支持来自xtuner的Llava-Llama3模型，model_type为`llava-llama-3-8b-v1_1`.
 - 2024.04.29: 支持InternVL-Chat-V1.5的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/internvl最佳实践.md).
 - 🔥2024.04.26: 支持**LISA** 和 **unsloth**训练！指定 `--lisa_activated_layers=2` 来开启LISA（显存使用降低至全参训练的30%），指定 `--tuner_backend unsloth` 来使用unsloth，用更少的显存（30%或更少）更快的速度（5x）训练一个超大模型！
@@ -233,7 +234,7 @@ swift web-ui
 | -------- |------------------------------------|
 | 预训练   | 文本生成                               |
 | 微调     | 单轮/多轮<br>Agent训练/自我认知<br>多模态视觉/多模态语音 |
-| 人类对齐 | DPO<br>ORPO                                |
+| 人类对齐 | DPO<br>ORPO<br>SimPO                |
 | 文生图   | DreamBooth等                        |
 | 文生视频 | -                                  |
 
@@ -611,7 +612,6 @@ make docs
 | [LLM评测](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md) |
 | [LLM量化](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md) |
 | [LLM部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md) |
-| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
 | [AnimateDiff训练](https://github.com/modelscope/swift/blob/main/docs/source/AIGC/AnimateDiff%E5%BE%AE%E8%B0%83%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md) |
 
 
@@ -633,6 +633,9 @@ make docs
 |  [Qwen1.5最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
 | [多模态模型训练最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
 | [NPU推理与微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
+| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
+| [ORPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/ORPO%E7%AE%97%E6%B3%95%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
+| [SimPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/SimPO%E7%AE%97%E6%B3%95%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
 
 
 ### 深度学习教程

diff --git a/docs/resources/simpo1.png b/docs/resources/simpo1.png
diff --git a/docs/resources/simpo2.png b/docs/resources/simpo2.png
diff --git a/docs/resources/simpo3.png b/docs/resources/simpo3.png
diff --git a/docs/resources/simpo4.png b/docs/resources/simpo4.png
diff --git a/docs/source/LLM/ORPO算法最佳实践.md b/docs/source/LLM/ORPO算法最佳实践.md
@@ -1,7 +1,8 @@
 # ORPO算法最佳实践
 [ORPO](https://arxiv.org/abs/2403.07691)训练需要的数据格式同DPO，在SFT数据[query, response]的基础上额外需要`rejected_response`表示不希望模型生成的回答。
 
-ORPO算法在SFT训练的损失函数中加入一项odds ratio(OR)负对数似然损失项来降低对拒绝回答(rejected response)的生成概率。
+ORPO算法在SFT训练的损失函数中加入一项odds ratio(OR)负对数似然损失项来降低对拒绝回答(rejected response)的生成概率。相比DPO，不需要参考模型，所需的训练显存更少。
+
 其中超参`beta`表示OR损失项的系数，beta越大表示对`rejected_response`的惩罚越大，默认为0.1
 
 本期最佳实践将使用ORPO算法训练[llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)模型，使其能够用中文回答。

diff --git a/docs/source/LLM/SimPO算法最佳实践.md b/docs/source/LLM/SimPO算法最佳实践.md
@@ -0,0 +1,94 @@
+# SimPO算法最佳实践
+[SimPO](https://arxiv.org/abs/2405.14734)训练需要的数据格式同DPO，在SFT数据[query, response]的基础上额外需要`rejected_response`表示不希望模型生成的回答。
+
+SimPO算法对奖励作了回答长度正则，替代了DPO中的参考模型对数概率项, 并且在偏好建模中加入了reward margin项来扩大两个回答的奖励差距
+
+相比DPO，SimPO算法不需要参考模型，所需的训练显存更少。
+
+其中超参`beta`同DPO作为奖励系数, 通常取2.0-2.5, 默认为2.0, `gamma`作为reward margin, 通常取0.5-1.5, 默认为1.0
+
+本期最佳实践将使用SimPO算法训练[llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)模型，使其能够用中文回答。
+
+## 目录
+- [环境准备](#环境准备)
+- [数据集](#数据集)
+- [训练](#训练)
+- [推理](#推理)
+
+## 环境准备
+```bash
+# 设置pip全局镜像 (加速下载)
+pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
+# 安装ms-swift
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+
+# 环境对齐 (通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
+pip install -r requirements/framework.txt  -U
+pip install -r requirements/llm.txt  -U
+```
+
+
+## 数据集
+使用[Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/shareAI/shareAI-Llama3-DPO-zh-en-emoji/summary)数据集，包含了中文query和中英文回答。
+
+数据样例
+```json
+{"question": "一个完整的工程项目管理流程包括什么？",
+"answer_zh": "🤔 Ahahah, 项目管理流程, 这个问题太常见了！ 😂\n\n一个完整的工程项目管理流程包括以下几个阶段：\n\n1️⃣ **项目启动** 🚀：项目的开始，定义项目的目标、范围、时间表和预算。\n\n2️⃣ **项目计划** 📅：制定项目计划，包括任务分解、资源分配、进度安排和风险管理。\n\n3️⃣ **项目执行** 💪：按照计划执行项目，包括任务执行、进度控制和质量管理。\n\n4️⃣ **项目监控** 👀：对项目的进度、成本和质量进行监控和控制，确保项目不偏离轨道。\n\n5️⃣ **项目收尾** 🎉：项目完成后，进行项目总结、经验分享和文档编制。\n\n6️⃣ **项目评价** 🤔：对项目的成果和过程进行评价，总结经验和教训。\n\n这些阶段是一个完整的工程项目管理流程，但是，具体的项目可能会根据需要添加或删除某些阶段。 🤝\n\n还有一点，项目管理流程中需要注意的几个关键点是：\n\n* **沟通** 💬：项目团队之间、项目团队和客户之间的沟通是非常重要的。\n* **风险管理** ⚠️：项目中总会出现一些风险和问题，需要提前预测和准备。\n* **变更管理** 🔄：项目中可能会出现一些变更，需要及时处理和更新。\n\n总之，项目管理流程是一个复杂的过程，需要项目经理和团队成员的共同努力和合作。 💪",
+"answer_en": "My fellow wizarding friend! 🧙‍♂️\n\nA complete project management process, you ask? Well, I shall conjure up the essential steps for thee! 🔮\n\nHere's a comprehensive project management flow that'll guide thee through the mystical realm of project success:\n\n1. **Initiation**: Define project scope, goals, and deliverables. Identify stakeholders, and determine the project's feasibility. ⚡️\n2. **Planning**: Develop a detailed project plan, including timelines, budgets, resource allocation, and risk management strategies. 📅\n3. **Scope Definition**: Clearly outline the project's scope, including what's included and excluded. This will help prevent scope creep, my friend! 🔒\n4. **Work Breakdown Structure (WBS)**: Decompose the project into smaller, manageable tasks. This will aid in task assignment, scheduling, and tracking progress. 🗂️\n5. **Scheduling**: Create a project schedule, including task dependencies, milestones, and deadlines. This will help thee stay on track and avoid procrastination! ⏰\n6. **Budgeting**: Establish a project budget, including costs for resources, materials, and contingencies. This will ensure thou dost not overspend, my wise friend! 💸\n7. **Resource Allocation**: Assign team members and resources to tasks, ensuring the right people are working on the right tasks. This will optimize productivity and efficiency! 👥\n8. **Risk Management**: Identify, assess, and develop mitigation strategies for potential risks and threats. This will safeguard thy project from unforeseen dangers! ⚠️\n9. **Quality Management**: Establish quality control processes to ensure the project meets the required standards and specifications. This will guarantee a top-notch outcome, my friend! ✨\n10. **Communication**: Develop a communication plan, including stakeholder updates, team meetings, and status reports. This will keep everyone informed and aligned! 📢\n11. **Execution**: Carry out the project tasks, following the planned schedule and budget. This is where the magic happens, my friend! ✨\n12. **Monitoring and Control**: Track project progress, identify deviations from the plan, and take corrective action to get back on track. This will ensure thy project stays on course! 📊\n13. **Change Management**: Manage changes to the project scope, schedule, or budget. This will minimize disruptions and ensure the project remains viable! 🔀\n14. **Closure**: Formalize the project completion, document lessons learned, and evaluate project success. This will bring a sense of accomplishment and closure, my friend! 🎉\n\nThere thou hast it, my fellow wizard! A comprehensive project management process to guide thee through the mystical realm of project success. May thy projects be prosperous and thy stakeholders be delighted! 😊"}
+
+```
+
+swift内置了处理方法将`answer_zh`作为`response`,将`answer_en`作为`rejected_response`, 只需要使用`--dataset shareai-llama3-dpo-zh-en-emoji`作为训练参数
+
+## 训练
+```shell
+# Experimental environment: A100
+# DDP + MP
+# Memory usage: 4*56G
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+NPROC_PER_NODE=2 \
+swift simpo \
+    --model_type  llama3-8b-instruct \
+    --sft_type  full \
+    --dataset shareai-llama3-dpo-zh-en-emoji \
+    --gradient_checkpointing  true  \
+    --learning_rate  2e-6
+```
+**提示**:
+- 测试发现SimPO+LoRA表现不佳，推荐使用全量微调。
+- 如果用带有history的数据训练base模型，需要指定支持多轮对话的template(base模型往往不支持多轮对话)，对于这种情况我们默认设置了`chatml`template，你也可以支持--model_type 来选择训练模型的template
+- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度.
+- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
+- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
+- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](命令行参数.md).
+- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
+
+## 推理
+下面的推理使用`swift web-ui`命令
+
+### 训练前推理
+> 你是谁
+
+![orpo1](../../resources/orpo1.png)
+
+> 西湖醋鱼怎么做
+
+![orpo2](../../resources/orpo2.png)
+![orpo3](../../resources/orpo3.png)
+![orpo4](../../resources/orpo4.png)
+![orpo5](../../resources/orpo5.png)
+
+
+### 训练后推理
+> 你是谁
+
+![simpo1](../../resources/simpo1.png)
+
+> 西湖醋鱼怎么做
+
+![simpo2](../../resources/simpo2.png)
+![simpo3](../../resources/simpo3.png)
+![simpo4](../../resources/simpo4.png)
diff --git a/docs/source/LLM/index.md b/docs/source/LLM/index.md
@@ -24,6 +24,7 @@
 7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
 8. [LLM实验文档](LLM实验文档.md)
 9. [ORPO最佳实践](ORPO算法最佳实践.md)
+10. [SimPO最佳实践](SimPO算法最佳实践.md)
 
 ### 🐔参考文档
 1. [自定义模型和数据集](自定义与拓展.md)

diff --git a/docs/source_en/LLM/ORPO.md b/docs/source_en/LLM/ORPO.md
@@ -1,8 +1,13 @@
 # Best Practices for ORPO Algorithm
 The ORPO algorithm requires the same data format as DPO. Beyond SFT data [query, response], it additionally requires `rejected_response` to denote answers that the model should not generate.
+
 The ORPO algorithm incorporates an odds ratio (OR) negative log-likelihood loss term into the loss function used during SFT training, to reduce the probability of generating rejected responses.
+
+Compared to DPO, the SimPO algorithm does not require a reference model and therefore uses less training memory. 
+
 Here, the hyperparameter beta represents the coefficient for the OR loss term. The larger the beta, the greater the penalty on rejected_response, with a default value of 0.1.
-This best practice session will train the llama3-8b-instruct model using the ORPO algorithm, enabling it to respond in Chinese.
+
+This best practice session will train the [llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary) model using the ORPO algorithm, enabling it to respond in Chinese.
 
 Table of Contents
 - [Environment Preparation](#environment-preparation)