[Models] Use in-place adds in Idefics2Vision #23932
Merged
+3
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This changes the
Idefics2VisionTransformer
model to use in-place adds where possible. This is relevant since vision encoders currently run outside of torch compile so things like this won't be automatically optimised away. See also #18922.Related to: #23884
Test Plan
Test Result
In an internal benchmark with many image inputs this has a surprisingly big performance impact. I'm seeing a 5.5% - 6.2% increase in throughput with the
openbmb/MiniCPM-V-4_5
model running on a L40s GPU.@DarkLight1337 let me know if you'd like me to do a bit of search and replace and apply similar changes in the other model definitions as well which might also benefit text only models when run in eager mode.