Improve PT/TF equivalence test #16557

ydshieh · 2022-04-01T21:31:56Z

What does this PR do?

Improve PT/TF equivalence test.

To make the review a bit easier for you, I made some comments. And here are a summary of changes:

test_pt_tf_model_equivalence in TensorFlow LED and CLIP are removed: the common one can handle it.
test_pt_tf_model_equivalence in TensorFlow LXMERT and ViTMAE are removed: we only need to overwrite
- prepare_pt_inputs_from_tf_inputs for LXMERT
- check_pt_tf_models for ViTMAE
Main changes in TFModelTesterMixin.test_pt_tf_model_equivalence
- restructure the code into components, so they could be overwritten separately instead of the whole big block
- move some ugly (temporary) logic blocks outside:
  - _make_attention_mask_non_null
  - _postprocessing_to_ignore_test_cases
- About check_pt_tf_outputs:
  - it now can handle instances of ModelOutput (for CLIP model)
  - better failure message: print the tensor name where the large diff between PT/TF occurs, like output.hidden_states or output.text_model_output.attentions_1
- A better way to handle the cases where PT/TF outputs have different keys: we try to test the output values for the common keys in both outputs.

Once this PR is approved/merged:

To work on the same PT/TF equivalence test on PT side (should be very quick)
To apply the same logic to PT/Flax equivalence test, both on Flax and PT sides.

HuggingFaceDocBuilderDev · 2022-04-01T21:54:15Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh · 2022-04-07T13:08:12Z

tests/clip/test_modeling_tf_clip.py

@@ -497,130 +496,6 @@ def test_keras_save_load(self):
                after_outputs = model(inputs_dict)
                self.assert_outputs_same(after_outputs, outputs)

-    # overwrite from common since CLIPModel/TFCLIPModel return CLIPOutput/TFCLIPOutput
-    @is_pt_tf_cross_test
-    def test_pt_tf_model_equivalence(self):


No need this anymore - the test in TF common can handle nested outputs, including instances of ModelOutput.

ydshieh · 2022-04-07T13:11:22Z

tests/led/test_modeling_tf_led.py

-    # TODO: Remove this once a more thorough pt/tf equivalence could be implemented in `test_modeling_tf_common.py`.
-    # (Currently, such a test will fail some other model tests: it requires some time to fix them.)
-    @is_pt_tf_cross_test
-    def test_pt_tf_model_equivalence_extra(self):


This was done before to make TF-LED having a strong test, while the common version was still a loose test.

Now the common test is (very) strong, we no longer need this test in TF-LED test.

ydshieh · 2022-04-07T13:14:20Z

tests/lxmert/test_modeling_tf_lxmert.py

-        if not is_torch_available():
-            return
-
+    def prepare_pt_inputs_from_tf_inputs(self, tf_inputs_dict):
        import torch


Can I add import torch here without is_torch_available or require_torch? This method will be called only inside test_pt_tf_model_equivalence, which is already decorated with is_pt_tf_cross_test.

That's just a marker that reads an env variable, so I think it should have the require_torch just in case, but I'm not sure if we are very consistent with that. @LysandreJik might know better.

I don't think it really matters as it is indeed already decorated with the is_pt_tf_cross_Test. We don't have a convention set, so feel free to choose the simplest approach.

ydshieh · 2022-04-07T13:16:56Z

tests/lxmert/test_modeling_tf_lxmert.py

+            if isinstance(value, dict):
+                pt_inputs_dict[key] = self.prepare_pt_inputs_from_tf_inputs(value)
+            elif isinstance(value, (list, tuple)):
+                pt_inputs_dict[key] = (self.prepare_pt_inputs_from_tf_inputs(iter_value) for iter_value in value)


This is the specific part for LXMERT test.

(It is possible to move this part to the common PT/TF test method. But I think it's fine/better to overwrite here.)

ydshieh · 2022-04-07T13:19:03Z

tests/lxmert/test_modeling_tf_lxmert.py

-            def torch_type(key):
-                if key in ("visual_feats", "visual_pos"):
-                    return torch.float32
-                else:
-                    return torch.long


Removed. The new version uses

elif tf_inputs_dict[key].dtype.is_floating:

I find it's cleaner and more general.

ydshieh · 2022-04-07T13:20:06Z

tests/lxmert/test_modeling_tf_lxmert.py

-            def recursive_numpy_convert(iterable):
-                return_dict = {}
-                for key, value in iterable.items():
-                    if isinstance(value, dict):
-                        return_dict[key] = recursive_numpy_convert(value)
-                    else:
-                        if isinstance(value, (list, tuple)):
-                            return_dict[key] = (
-                                torch.from_numpy(iter_value.numpy()).to(torch_type(key)) for iter_value in value
-                            )
-                        else:
-                            return_dict[key] = torch.from_numpy(value.numpy()).to(torch_type(key))
-                return return_dict


In the new version, this is handled in prepare_pt_inputs_from_tf_inputs.

if isinstance(value, dict): pt_inputs_dict[key] = self.prepare_pt_inputs_from_tf_inputs(value) elif isinstance(value, (list, tuple)): pt_inputs_dict[key] = (self.prepare_pt_inputs_from_tf_inputs(iter_value)

ydshieh · 2022-04-07T13:23:23Z

tests/lxmert/test_modeling_tf_lxmert.py

@@ -486,135 +488,31 @@ def check_hidden_states_output(config, inputs_dict, model_class):
            config.output_hidden_states = True
            check_hidden_states_output(config, inputs_dict, model_class)

-    def test_pt_tf_model_equivalence(self):


In the new version, we only need to overwrite prepare_pt_inputs_from_tf_inputs, because that is the place with actual differences from the common version.

ydshieh · 2022-04-07T13:25:01Z

tests/vit_mae/test_modeling_tf_vit_mae.py


-            check_pt_tf_models(tf_model, pt_model)
+        super().check_pt_tf_models(tf_model, pt_model, tf_inputs_dict)


I prefer to call super() here, because the difference is only about adding a noise argument in the block above.

ydshieh · 2022-04-07T13:25:32Z

tests/vit_mae/test_modeling_tf_vit_mae.py

@@ -363,140 +363,20 @@ def check_hidden_states_output(inputs_dict, config, model_class):

    # overwrite from common since TFViTMAEForPretraining has random masking, we need to fix the noise
    # to generate masks during test
-    @is_pt_tf_cross_test
-    def test_pt_tf_model_equivalence(self):


We just need to overwrite check_pt_tf_models.

ydshieh · 2022-04-07T13:51:12Z

tests/test_modeling_tf_common.py

+                being a named field in the output.
+        """
+
+        self.assertEqual(type(name), str)


not sure if we should test this argument. I think it is not worth it.

Now sure why it was added, but it doesn't look useful I agree.

It was added by me during the process: sometimes I passed the wrong arguments and got errors.

However, those arguments are unlikely to be used by anyone else (unless someone want to change check_pt_tf_outputs)

ydshieh · 2022-04-07T13:52:15Z

tests/test_modeling_tf_common.py

+            tf_outputs[pt_nans] = 0
+
+            max_diff = np.amax(np.abs(tf_outputs - pt_outputs))
+            self.assertLessEqual(max_diff, tol, f"{name}: Difference between torch and tf is {max_diff} (>= {tol}).")


make the failure message more informative by adding the corresponding tensor name, like

output.hidden_states

sgugger

Thanks for cleaning those. It's great we can remove some model-specific code to rely on the generic common tests!

sgugger · 2022-04-07T14:30:19Z

tests/lxmert/test_modeling_tf_lxmert.py

-        if not is_torch_available():
-            return
-
+    def prepare_pt_inputs_from_tf_inputs(self, tf_inputs_dict):
        import torch


That's just a marker that reads an env variable, so I think it should have the require_torch just in case, but I'm not sure if we are very consistent with that. @LysandreJik might know better.

sgugger · 2022-04-07T14:31:32Z

tests/test_modeling_tf_common.py

+                being a named field in the output.
+        """
+
+        self.assertEqual(type(name), str)


Now sure why it was added, but it doesn't look useful I agree.

sgugger · 2022-04-07T14:32:10Z

tests/vit_mae/test_modeling_tf_vit_mae.py


-            check_pt_tf_models(tf_model, pt_model)
+        super().check_pt_tf_models(tf_model, pt_model, tf_inputs_dict)


gante

This is great, it makes writing tests for edge cases much easier 🚀

…fields

ydshieh · 2022-04-11T19:41:42Z

(just rebase on main - no real change since your last review)

ydshieh · 2022-04-11T20:17:20Z

Merge now. Don't hesitate to leave comments in any :-)

* add error message * Use names in the error message * allow ModelOutput * rename to check_pt_tf_outputs and move outside * fix style * skip past_key_values in a better way * Add comments * improve code for label/loss * make the logic clear by moving the ignore keys out * fix _postprocessing_to_ignore * fix _postprocessing_to_ignore: create new outputs from the remaining fields * ignore past_key_values in TFGPT2 models for now * make check_pt_tf_outputs better regarding names * move check_pt_tf_models outside * rename methods * remove test_pt_tf_model_equivalence in TFCLIPModelTest * Reduce TFViTMAEModelTest.test_pt_tf_model_equivalence * move prepare_pt_inputs_from_tf_inputs outside check_pt_tf_models * Fix quality * Clean-up TFLxmertModelTester.test_pt_tf_model_equivalence * Fix quality * fix * fix style * Clean-up TFLEDModelTest.test_pt_tf_model_equivalence * Fix quality * add docstring * improve comment Co-authored-by: ydshieh <[email protected]>

ydshieh force-pushed the improve_pt_tf_equiv_test branch from ea4923f to df2fc56 Compare April 4, 2022 15:59

This was referenced Apr 4, 2022

Fix TFTransfoXLLMHeadModel outputs #16590

Merged

Use CLIP model config to set some kwargs for components #16609

Merged

ydshieh force-pushed the improve_pt_tf_equiv_test branch from 5fa6bd2 to b1c194d Compare April 6, 2022 15:55

ydshieh commented Apr 7, 2022

View reviewed changes

ydshieh changed the title ~~[WIP] Improve pt tf equiv test~~ Improve PT/TF equivalence test Apr 7, 2022

ydshieh marked this pull request as ready for review April 7, 2022 13:27

ydshieh requested review from gante, sgugger, patrickvonplaten and Rocketknight1 April 7, 2022 13:42

ydshieh commented Apr 7, 2022

View reviewed changes

sgugger approved these changes Apr 7, 2022

View reviewed changes

gante approved these changes Apr 8, 2022

View reviewed changes

ydshieh added 6 commits April 11, 2022 21:29

add error message

dbb12ff

Use names in the error message

0ee384d

allow ModelOutput

adaffb2

rename to check_pt_tf_outputs and move outside

4ab0cd5

fix style

4326096

skip past_key_values in a better way

e189948

ydshieh added 21 commits April 11, 2022 21:29

Add comments

4968a04

improve code for label/loss

44e4810

make the logic clear by moving the ignore keys out

02b5e39

fix _postprocessing_to_ignore

f29bb1c

fix _postprocessing_to_ignore: create new outputs from the remaining …

cac73b0

…fields

ignore past_key_values in TFGPT2 models for now

39abab3

make check_pt_tf_outputs better regarding names

1a0721f

move check_pt_tf_models outside

eeefa95

rename methods

a7869b3

remove test_pt_tf_model_equivalence in TFCLIPModelTest

8a03a14

Reduce TFViTMAEModelTest.test_pt_tf_model_equivalence

4b6abcf

move prepare_pt_inputs_from_tf_inputs outside check_pt_tf_models

c5ba554

Fix quality

4828345

Clean-up TFLxmertModelTester.test_pt_tf_model_equivalence

1ecb04f

Fix quality

b7cc2c5

fix

f42529f

fix style

3b3386c

Clean-up TFLEDModelTest.test_pt_tf_model_equivalence

4d77c0c

Fix quality

5a41d5c

add docstring

fe9529f

improve comment

b703e6c

ydshieh force-pushed the improve_pt_tf_equiv_test branch from cdae60f to b703e6c Compare April 11, 2022 19:41

ydshieh merged commit dce33f2 into huggingface:main Apr 11, 2022

ydshieh deleted the improve_pt_tf_equiv_test branch April 11, 2022 20:19

ydshieh mentioned this pull request Apr 12, 2022

Improve test_pt_tf_model_equivalence on PT side #16731

Merged


		check_pt_tf_models(tf_model, pt_model)
		super().check_pt_tf_models(tf_model, pt_model, tf_inputs_dict)

Improve PT/TF equivalence test #16557

Improve PT/TF equivalence test #16557

Uh oh!

Conversation

ydshieh commented Apr 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Apr 11, 2022

Uh oh!

ydshieh commented Apr 11, 2022

Uh oh!

Uh oh!

ydshieh commented Apr 1, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 1, 2022 •

edited

Loading

ydshieh Apr 7, 2022 •

edited

Loading