-
Notifications
You must be signed in to change notification settings - Fork 30.8k
🚨🚨🚨 Fix sdpa in sam and refactor relative position embeddings #36422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @geetu040, thanks for opening the PR! |
run-slow: sam |
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/sam'] |
run-slow: sam |
Hi @qubvel , I have fixed the failing tests at runs/13544413658. This PR is ready for review now and can you please run slow tests again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
run-slow: sam
A question:
counts += ( | ||
[cur_idxs[0].numpy().item()] + btw_idxs.numpy().tolist() + [height * width - cur_idxs[-1].numpy().item()] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need ot add .numpy()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because tensorflow tensors don't have a direct item()
method and to access the scalar value, we need to do numpy().item()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok! just wondering if it was broken before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a bunch of other tensorflow bugs that I don't why didnot appear in the workflows before, except in #36493, and I have checked for different version of tensorflow they seem to be geniune bugs over all versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, not sure if there any usage of TF Sam actually, so there might be some bugs indeed!
run-slow: sam |
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/sam'] |
@qubvel the custom tests have passed now, the failing ones seem unrelated |
Nice! Thanks for fixing |
Lets wait for those tests being fixed on main, and then merge it |
cc @ArthurZucker for approval: changing |
@ArthurZucker a soft ping, since this blocks the failing tests in other PRs: #36248, #36493 |
Hi @qubvel, would it be possible to ping someone else for this? It seems that @ArthurZucker is quite busy and might not have had a chance to review it. Since this is blocking other PRs (#36248 and #36493), it would be great if we could get it merged. Otherwise, if this will take some time, I can merge this branch into the other PRs and continue working from there. |
cc @molbap or @zucchini-nlp if you have bandwidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking, no? We had add_decomposed_rel_pos
as public method and now we're removing it completely
I am oke to break, don't think anyone actually was calling it separately. But we can add a 🔴 in PR title
else: | ||
kwarg_value = "__empty__" | ||
if kwarg_value != "__empty__": | ||
if not isinstance(kwarg_value, str) or kwarg_value != "__empty__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder, why this was needed? Even if kwargs_value
is float, checking kwarg_value != "__empty__"
is enough isn't it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was necessary because kwarg_value
cannot be directly compared to "__empty__"
if it is a TensorFlow tensor. Attempting such a comparison results in a TypeError
:
tf.Variable([1, 2, 3, 4]) == "hello"
# TypeError: Cannot convert '__empty__' to EagerTensor of dtype int32
This issue occurred in the test suite test_modeling_tf_sam.py::TFSamModelIntegrationTest
(slow tests for sam) within this workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, thanks for explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, sorry, meant to hit approve! LGTM as long as qubvel is okey with the changes
Added a section in PR initial message with breaking change description, merging |
…gface#36422) * fall back to eager if output_attentions * improve relative position embeddings * run modular on got_ocr2 * run-slow: sam * fix run-length encoding * fix tf processor errors * update tf_sam * fix compile error * re-run tests Signed-off-by: Mehant Kammakomati <[email protected]>
…gface#36422) * fall back to eager if output_attentions * improve relative position embeddings * run modular on got_ocr2 * run-slow: sam * fix run-length encoding * fix tf processor errors * update tf_sam * fix compile error * re-run tests Signed-off-by: Mehant Kammakomati <[email protected]>
What does this PR do?
This PR
SamVisionSdpaAttention
whenoutput_attentions=True
, fall back to eager implementationTFSamVisionSdpaAttention
andGotOcr2VisionAttention
Previously discussed here: #36248 (comment)
🚨 Breaking changes
add_decomposed_rel_pos
public method of*Attention
module was replaced withget_decomposed_rel_pos
method, which will return pos embedding instead of adding it to the attention weightsWho can review?
@amyeroberts, @qubvel, @zucchini-nlp