You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
39
39
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
40
40
41
41
## 🎉 News
42
+
- 2024.04.08: Support the fine-tuning and inference of XVERSE-MoE-A4.2B model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/xverse_moe_a4_2b/lora/sft.sh) to start training!
42
43
- 2024.04.04: Support **QLoRA+FSDP** to train a 70B model with two 24G memory GPUs, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama2_70b_chat/qlora_fsdp/sft.sh) to train.
43
44
- 🔥2024.04.03: Support **Qwen1.5-32B** series: Qwen1.5-32B, Qwen1.5-32B-Chat, Qwen1.5-32B-Chat-GPTQ-Int4.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_32b_chat/lora_mp/sft.sh) to start training!
44
45
- 🔥2024.04.02: Support the fine-tuning and inference of Mengzi3-13B-Base model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh) to start training!
@@ -373,11 +374,11 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
373
374
374
375
| Model Type | Model Introduction | Language | Model Size | Model Type |
| Qwen<br>Qwen1.5 |[Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM)| Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model |
377
+
| Qwen<br>Qwen1.5 |[Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM)| Chinese<br>English | 0.5B-72B<br>including quantized versions | base model<br>chat model<br>MoE model|
377
378
| ChatGLM2<br>ChatGLM3<br>Codegeex2 |[Zhipu ChatGLM series models](https://github.com/THUDM)| Chinese<br>English | 6B | base model<br>chat model<br>code model |
378
379
| Baichuan/Baichuan2 |[Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc)| Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
379
380
| Yuan2 |[Langchao Yuan series models](https://github.com/IEIT-Yuan)| Chinese<br>English | 2B-102B | instruct model |
380
-
| XVerse |[XVerse series models](https://github.com/xverse-ai)| Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model |
381
+
| XVerse |[XVerse series models](https://github.com/xverse-ai)| Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
381
382
| LLaMA2 |[LLaMA2 series models](https://github.com/facebookresearch/llama)| English | 7B-70B<br>including quantized versions | base model<br>chat model |
382
383
| Mistral<br>Mixtral |[Mistral series models](https://github.com/mistralai/mistral-src)| English | 7B | base model<br>instruct model<br>MoE model |
383
384
| YI |[01AI's YI series models](https://github.com/01-ai)| Chinese<br>English | 6B-34B | base model<br>chat model<br>long text model |
0 commit comments