Skip to content

Conversation

@Rocketknight1
Copy link
Member

CLIP models were not being tested correctly with fit() because the test skipped models without a hf_compute_loss method. This skip was added to skip base models like TFBERTModel that do not have specific output heads and losses. However, it also skips models like CLIP that do not use compute_loss / hf_compute_loss methods.

The new test checks whether the model's return type dataclass has a loss key, which is a more reliable check. Enabling this reveals the bug in fit() for TFClip, so this PR also includes fixes to train_step and test_step for CLIP and models like it that require return_loss=True to be passed, but do not set it by default.

Draft for now because this will likely flush out other bugs or cause other problems!

Fixes #18670.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 18, 2022

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1
Copy link
Member Author

Rocketknight1 commented Aug 18, 2022

The tests passed. I'm as surprised as everyone else. It's ready for review!

@Rocketknight1 Rocketknight1 requested a review from gante August 18, 2022 15:45
@Rocketknight1 Rocketknight1 marked this pull request as ready for review August 18, 2022 15:47
@ydshieh
Copy link
Collaborator

ydshieh commented Aug 18, 2022

Thanks @Rocketknight1 for quick fix. I am wondering if it makes sense to wrap the loss computation for TFCLIP into a hf_compute_loss?

I also see for TFSegformerForSemanticSegmentation, it defines hf_compute_loss under its own model class.

If doable, maybe it's good to add TFSemanticSegmentationLoss, TFcontrastiveLoss class etc.

Just want to hear some opinions. cc @gante @amyeroberts

@Rocketknight1
Copy link
Member Author

@ydshieh I don't think that's necessary - the new check we use means we don't need to rely on hf_compute_loss being present anymore!

@Rocketknight1
Copy link
Member Author

Just realized that using return instead of continue in the tests was skipping a lot of tests, which might have been why it was green so easily. Rerunning everything!

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 And I agree, relying on a loss output is safer!

Will re-review after fixing errors (I see the latest commit, which replaces the return, broke some tests :D)

@Rocketknight1 Rocketknight1 requested a review from sgugger August 18, 2022 18:12
@Rocketknight1
Copy link
Member Author

Further updates: Now that we're no longer incorrectly skipping tests, this turned up quite a few bugs! The main source of issues is that some more recent models are returning scalar losses, but both our test_loss_computation test and Keras fit() expect that the loss has some shape, even if that shape is (1,).

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure looks good to me. Leaving some comments as I was 👀 , but just nits.

added_label_names = sorted(list(prepared_for_class.keys() - inputs_dict.keys()), reverse=True)
if not added_label_names:
continue # This test is only for models with easily-separable labels
added_label = prepared_for_class[added_label_names[0]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assumed there's only one added label? Not sure if it matters, but if we can and it does, can we add an assert above?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a nit, but LGTM! Thanks for working on this!

@Rocketknight1 Rocketknight1 merged commit 660e0b9 into main Sep 9, 2022
@Rocketknight1 Rocketknight1 deleted the return_loss_fix branch September 9, 2022 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TFClipModel fails to train because of None loss

6 participants