Updates
Per further discussion, the difference is intentional, but undocumented. It is a difference with the reference implementation from Google Big Vision.
Original Report
Fix location:
This causes the default to be "bicubic":
|
pos_embed_interp_mode: str = 'bicubic' # Interpolation mode for position embedding resizing |
Reference code showing "bilinear" interpolation:
https://github.com/google-research/big_vision/blob/0127fb6b337ee2a27bf4e54dea79cff176527356/big_vision/models/proj/image_text/naflex_vit.py#L67
After making this change, TIMM is able to forward siglip2 naflex with cosine similarly at each intermediate above 0.9999.