[Flax `.from_pretrained`] Raise a warning if model weights are not in float32 #16762

sanchit-gandhi · 2022-04-13T17:05:19Z

The Flax .from_pretrained method respects the dtype of the model weights from which it is loaded. For model weights stored in bfloat16/float16, Flax models are instantiated with parameter weights in bfloat16/float16 respectively (see #16736). The general assumption is that all Flax model weights are in float32. Loading and storing model weights in a lower precision (bfloat16/float16) is likely to lead to undesirable behaviour and model instabilities. This PR adds a warning to the .from_pretrained method should any of the model weights not be in float32, and advices the user to upcast the weights to float32 prior to use.

HuggingFaceDocBuilderDev · 2022-04-13T17:21:13Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj

LGTM!

patil-suraj · 2022-04-13T17:38:22Z

src/transformers/modeling_flax_utils.py

+        # dictionary of key: bools that establish whether each parameter is in jnp.float32
+        param_dtypes = jax.tree_map(lambda x: x.dtype != jnp.float32, state)
+        # extract keys of parameters not in jnp.float32
+        downcast_params = [k for k in param_dtypes if param_dtypes[k]]


(nit) maybe name it as half_precision_params

Have modified the code to generate two different lists: one for fp16 params and another for bf16 params. The warning message then specifies the erroneous dtype of the model weights loaded. I think this is more informative than simply saying the weights are in a dtype different to fp32.

src/transformers/modeling_flax_utils.py

sanchit-gandhi · 2022-04-13T21:26:12Z

As an example, loading a set of PyTorch float16 Bart model weights into a FlaxBartForCausalLM model produces the following warning:

from transformers import FlaxBartForCausalLM
model = FlaxBartForCausalLM.from_pretrained('sanchit-gandhi/tiny-random-bart-fp16', from_pt=True)

Some of the weights of FlaxBartForCausalLM were initialized in float16 precision from the model checkpoint at sanchit-gandhi/tiny-random-bart-fp16:
[('model', 'decoder', 'embed_positions', 'embedding'), ('model', 'decoder', 'embed_tokens', 'embedding'), ('model', 'decoder', 'layernorm_embedding', 'bias'), ('model', 'decoder', 'layernorm_embedding', 'scale'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'k_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'k_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'out_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'out_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'q_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'q_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'v_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'encoder_attn', 'v_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'encoder_attn_layer_norm', 'bias'), ('model', 'decoder', 'layers', '0', 'encoder_attn_layer_norm', 'scale'), ('model', 'decoder', 'layers', '0', 'fc1', 'bias'), ('model', 'decoder', 'layers', '0', 'fc1', 'kernel'), ('model', 'decoder', 'layers', '0', 'fc2', 'bias'), ('model', 'decoder', 'layers', '0', 'fc2', 'kernel'), ('model', 'decoder', 'layers', '0', 'final_layer_norm', 'bias'), ('model', 'decoder', 'layers', '0', 'final_layer_norm', 'scale'), ('model', 'decoder', 'layers', '0', 'self_attn', 'k_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'self_attn', 'k_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'self_attn', 'out_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'self_attn', 'out_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'self_attn', 'q_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'self_attn', 'q_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'self_attn', 'v_proj', 'bias'), ('model', 'decoder', 'layers', '0', 'self_attn', 'v_proj', 'kernel'), ('model', 'decoder', 'layers', '0', 'self_attn_layer_norm', 'bias'), ('model', 'decoder', 'layers', '0', 'self_attn_layer_norm', 'scale'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'k_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'k_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'out_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'out_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'q_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'q_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'v_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'encoder_attn', 'v_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'encoder_attn_layer_norm', 'bias'), ('model', 'decoder', 'layers', '1', 'encoder_attn_layer_norm', 'scale'), ('model', 'decoder', 'layers', '1', 'fc1', 'bias'), ('model', 'decoder', 'layers', '1', 'fc1', 'kernel'), ('model', 'decoder', 'layers', '1', 'fc2', 'bias'), ('model', 'decoder', 'layers', '1', 'fc2', 'kernel'), ('model', 'decoder', 'layers', '1', 'final_layer_norm', 'bias'), ('model', 'decoder', 'layers', '1', 'final_layer_norm', 'scale'), ('model', 'decoder', 'layers', '1', 'self_attn', 'k_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'self_attn', 'k_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'self_attn', 'out_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'self_attn', 'out_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'self_attn', 'q_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'self_attn', 'q_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'self_attn', 'v_proj', 'bias'), ('model', 'decoder', 'layers', '1', 'self_attn', 'v_proj', 'kernel'), ('model', 'decoder', 'layers', '1', 'self_attn_layer_norm', 'bias'), ('model', 'decoder', 'layers', '1', 'self_attn_layer_norm', 'scale')]
You should probably UPCAST the model weights to float32 if this was not intended. See [`~FlaxPreTrainedModel.to_fp32`] for further information on how to do this.

patil-suraj · 2022-04-14T09:52:03Z

src/transformers/modeling_flax_utils.py

+        if len(fp16_params) > 0:
+            logger.warning(
+                f"Some of the weights of {model.__class__.__name__} were initialized in float16 precision from "
+                f"the model checkpoint at {pretrained_model_name_or_path}:\n{fp16_params}\n"
+                "You should probably UPCAST the model weights to float32 if this was not intended. "
+                "See [`~FlaxPreTrainedModel.to_fp32`] for further information on how to do this."
+            )
+
+        if len(bf16_params) > 0:
+            logger.warning(
+                f"Some of the weights of {model.__class__.__name__} were initialized in bfloat16 precision from "
+                f"the model checkpoint at {pretrained_model_name_or_path}:\n{bf16_params}\n"
+                "You should probably UPCAST the model weights to float32 if this was not intended. "
+                "See [`~FlaxPreTrainedModel.to_fp32`] for further information on how to do this."
+            )
+


sanchit-gandhi · 2022-04-14T10:57:40Z

Sorry this a super nitty question, but I just wanted to ask to make sure we're all on the same page for best practice! Should one not ideally merge their own PR's rather than the reviewer?

patil-suraj · 2022-04-14T11:26:26Z

Aah, yes! One should merge their own PRs, I rushed a bit this one.

… float32 (huggingface#16762) * [Flax] Raise a warning if model weights are not in float32 * apply suggestions and few small changes * reorder wording for better readability

[Flax] Raise a warning if model weights are not in float32

09d38b3

sanchit-gandhi changed the title ~~[Flax .from_pretrained] Raise a warning if model weights are not in float32~~ [Flax .from_pretrained] Raise a warning if model weights are not in float32 Apr 13, 2022

sanchit-gandhi requested review from patrickvonplaten and patil-suraj April 13, 2022 17:19

patil-suraj approved these changes Apr 13, 2022

View reviewed changes

apply suggestions and few small changes

9eac5ec

reorder wording for better readability

368e61d

patil-suraj reviewed Apr 14, 2022

View reviewed changes

patil-suraj merged commit d8269eb into huggingface:main Apr 14, 2022

sanchit-gandhi mentioned this pull request Apr 14, 2022

[Flax] Torch fp16 model weights not upcast when loaded in Flax #16736

Closed

sanchit-gandhi deleted the flax-from-pretrained branch June 25, 2023 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flax `.from_pretrained`] Raise a warning if model weights are not in float32 #16762

[Flax `.from_pretrained`] Raise a warning if model weights are not in float32 #16762

Uh oh!

sanchit-gandhi commented Apr 13, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Apr 13, 2022 •

edited

Loading

Uh oh!

patil-suraj left a comment

Uh oh!

patil-suraj Apr 13, 2022

Uh oh!

sanchit-gandhi Apr 13, 2022

Uh oh!

Uh oh!

sanchit-gandhi commented Apr 13, 2022 •

edited

Loading

Uh oh!

patil-suraj Apr 14, 2022

Uh oh!

sanchit-gandhi commented Apr 14, 2022 •

edited

Loading

Uh oh!

patil-suraj commented Apr 14, 2022

Uh oh!

Uh oh!

[Flax .from_pretrained] Raise a warning if model weights are not in float32 #16762

[Flax .from_pretrained] Raise a warning if model weights are not in float32 #16762

Uh oh!

Conversation

sanchit-gandhi commented Apr 13, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sanchit-gandhi commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patil-suraj Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi commented Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patil-suraj commented Apr 14, 2022

Uh oh!

Uh oh!

[Flax `.from_pretrained`] Raise a warning if model weights are not in float32 #16762

[Flax `.from_pretrained`] Raise a warning if model weights are not in float32 #16762

HuggingFaceDocBuilderDev commented Apr 13, 2022 •

edited

Loading

sanchit-gandhi commented Apr 13, 2022 •

edited

Loading

sanchit-gandhi commented Apr 14, 2022 •

edited

Loading