-
Notifications
You must be signed in to change notification settings - Fork 482
Open
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation, and pitch
Let's migrate to peft
.
Tasks
Doing so will require the following updates:
-
Replace the
opendelta
setup in theAccelerateBaseTrainer
with apeft
backed setup:trlx/trlx/trainer/accelerate_base_trainer.py
Lines 145 to 155 in 92b68e4
if self.config.model.delta_kwargs is not None: delta_type, delta_kwargs = parse_delta_kwargs( model.base_model.config, self.config.model.delta_kwargs, self.config.model.num_layers_unfrozen, ) delta_model_class = get_delta_model_class(delta_type) delta_model = delta_model_class(model.base_model, **delta_kwargs) delta_model.freeze_module(exclude=["deltas"], set_state_dict=True) if self.accelerator.is_main_process: delta_model.log() -
Handle fine-grained layer capturing to only modify the upper trunk layers of hydra architectures as handled below:
Lines 414 to 428 in 92b68e4
def get_delta_modified_modules( config: transformers.PretrainedConfig, modified_modules: List[str], num_layers_unfrozen: int = -1, ) -> List[str]: """Returns a list of module names to be modified for a given delta method with the specified number of learnable layers.""" unfrozen_layers_pattern = generate_layer_regex(config, num_layers_unfrozen) # [r] for regex as per https://github.com/thunlp/OpenDelta/blob/main/opendelta/utils/name_based_addressing.py#L20 regex_prefix = "[r]" # TODO (jon-tow): `decoder.block.` is hardcoded to support T5 layer naming. decoder_prefix = "decoder.block." if config.is_encoder_decoder else "" module_list = [regex_prefix + decoder_prefix + unfrozen_layers_pattern + module for module in modified_modules] return module_list
Motivation
Citing @ethankim00's concerns with opendelta
:
opendelta
import fails due to an unnecessaryturtle
package import. Even if pip installed, users may be required to havesudo
privileges to install the corresponding base graphics packageModuleNotFoundError
caused byturtle
package thunlp/OpenDelta#47- Doesn’t seem to work with DeepSpeed ZeRO 3
- Additional inference overhead from not merging in the LoRA adapters layers
- Incompatibility with int8 training
- Less actively maintained than the
peft
library, which has been growing rapidly - Sharing adapter weights on the HuggingFace Hub is less convenient with
opendelta
Alternatives
No response
Additional context
No response
DanqingZ and akk-123
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request