Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
- DeepSeek-VL模型: `https://github.com/deepseek-ai/DeepSeek-VL`
- YI-VL模型: `https://github.com/01-ai/Yi`
- LLAVA模型: `https://github.com/haotian-liu/LLaVA.git`
- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full', 'longlora', 'qalora'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
- `--packing`: pack数据集到`max-length`, 默认值`False`.
- `--freeze_parameters`: 当sft_type指定为'full'时, 将模型最底部的参数进行freeze. 指定范围为0. ~ 1., 默认为`0.`. 该参数提供了lora与全参数微调的折中方案.
- `--additional_trainable_parameters`: 作为freeze_parameters的补充, 只有在sft_type指定为'full'才允许被使用, 默认为`[]`. 例如你如果想训练50%的参数的情况下想额外训练embedding层, 你可以设置`--freeze_parameters 0.5 --additional_trainable_parameters transformer.wte`, 所有以`transformer.wte`开头的parameters都会被激活.
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/LLM/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
- DeepSeek-VL model: `https://github.com/deepseek-ai/DeepSeek-VL`
- YI-VL model: `https://github.com/01-ai/Yi`
- LLAVA model: `https://github.com/haotian-liu/LLaVA.git`
- `--sft_type`: Fine-tuning method, default is `'lora'`. Options include: 'lora', 'full', 'longlora', 'qalora'. If using qlora, you need to set `--sft_type lora --quantization_bit 4`.
- `--sft_type`: Fine-tuning method, default is `'lora'`. Options include: 'lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft'. If using qlora, you need to set `--sft_type lora --quantization_bit 4`.
- `--packing`: pack the dataset length to `max-length`, default `False`.
- `--freeze_parameters`: When sft_type is set to 'full', freeze the bottommost parameters of the model. Range is 0. ~ 1., default is `0.`. This provides a compromise between lora and full fine-tuning.
- `--additional_trainable_parameters`: In addition to freeze_parameters, only allowed when sft_type is 'full', default is `[]`. For example, if you want to train embedding layer in addition to 50% of parameters, you can set `--freeze_parameters 0.5 --additional_trainable_parameters transformer.wte`, all parameters starting with `transformer.wte` will be activated.
Expand Down
3 changes: 2 additions & 1 deletion swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,8 @@ class SftArguments(ArgumentsBase):
lora_lr_ratio: float = None
use_rslora: bool = False
use_dora: bool = False
init_lora_weights: Literal['gaussian', 'pissa', 'pissa_niter_[number of iters]', 'loftq', 'true', 'false'] = 'true'
# Literal['gaussian', 'pissa', 'pissa_niter_[number of iters]', 'loftq', 'true', 'false']
init_lora_weights: str = 'true'

# BOFT
boft_block_size: int = 4
Expand Down