Skip to content

Conversation

pass-lin
Copy link
Contributor

@pass-lin pass-lin commented Sep 6, 2025

Mainly two simple modifications

  1. Add a description of the modelscope download source from preset documents
  2. Make the sam model support bf16 inference

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @pass-lin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the model loading capabilities by documenting ModelScope as a valid preset source and improves the computational efficiency of the SAM model by enabling bfloat16 inference. Additionally, it refines the core tensor padding utility and related preprocessing layers, ensuring more consistent and predictable data handling across the library.

Highlights

  • ModelScope Integration: Added ModelScope as a supported download source for presets in the Backbone.from_preset method documentation, expanding the range of accessible models.
  • BF16 Inference for SAM Model: Enabled bfloat16 (bf16) inference support for the SAM (Segment Anything Model) by passing the dtype to the RandomFrequencyPositionalEmbeddings layer in the prompt encoder.
  • Improved Tensor Padding Logic: Refactored the pad utility function to provide more robust and explicit handling of left-side padding, ensuring correct tensor shapes and values. This includes changes to how padding is applied and the introduction of an axis parameter.
  • Preprocessing Layer Refinements: Made minor adjustments to MultiSegmentPacker by simplifying the shape definition and to StartEndPacker by ensuring pad_value defaults to 0 if not provided.
  • Test Case Updates: Updated various test cases for MultiSegmentPacker and StartEndPacker to align with the refined padding logic, ensuring the expected outputs match the new behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pass-lin pass-lin closed this Sep 6, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two main changes: adding ModelScope as a download source and enabling bfloat16 inference for the SAM model. The changes for these features are well-implemented. However, there are significant issues with the modifications to the padding logic in tensor_utils.py. The new implementation for left-padding is incorrect and effectively performs right-padding, which is a critical bug. Several tests have been updated to match this incorrect behavior, and they should be reverted along with the fix. Additionally, a related change in start_end_packer.py introduces a bug for string data types.

Comment on lines +24 to 42
def pad(x, shape, padding_side, pad_value, axis=-1):
if padding_side == "left" and pad_value is not None:
x = x[..., ::-1]

outputs = x.to_tensor(
default_value=pad_value,
shape=shape,
)

if padding_side == "left":
outputs = x.to_tensor(
default_value=pad_value,
)
outputs = outputs[..., ::-1]
padding_shape = [tf.shape(outputs)[0]] + [1] * (len(outputs.shape) - 1)
padding_shape[axis] = shape[axis] - tf.shape(outputs)[axis]
padding_shape = tf.cast(padding_shape, "int64")
padding = tf.fill(padding_shape, pad_value)
padding = tf.cast(padding, outputs.dtype)
outputs = tf.concat([outputs, padding], axis=axis)
else:
outputs = x.to_tensor(
default_value=pad_value,
shape=tf.cast(shape, "int64"),
)
return outputs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The updated implementation of pad for padding_side="left" is incorrect. It appears to perform right-padding instead of left-padding. The logic first left-pads the tensor to the maximum sequence length within the batch, and then concatenates additional padding to the right to match the target shape. This results in mixed padding, not pure left-padding.

This bug causes tests in multi_segment_packer_test.py and start_end_packer_test.py to fail, and they have been modified to expect this incorrect right-padding behavior. The tests should be reverted to their original state once this padding logic is corrected.

To fix this, the previous implementation strategy of reversing the tensor, padding, and then reversing back should be restored. This is a standard and correct way to achieve left-padding.

Here is a suggested implementation that restores the correct behavior:

def pad(x, shape, padding_side, pad_value, axis=-1):
    shape = tf.cast(shape, "int64")
    if padding_side == "left":
        # Reverse, pad, and reverse back is a standard way to do left padding.
        x = x[..., ::-1]
        outputs = x.to_tensor(
            default_value=pad_value,
            shape=shape,
        )
        outputs = outputs[..., ::-1]
    else:  # padding_side == "right"
        outputs = x.to_tensor(
            default_value=pad_value,
            shape=shape,
        )
    return outputs

@@ -141,7 +141,7 @@ def check_special_value_type(value, value_name):
self.start_value = start_value
self.end_value = end_value

self.pad_value = pad_value
self.pad_value = pad_value or 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change introduces a bug for string-based tensors. When pad_value is None, it will be set to 0. If the layer is then used with string inputs, this will cause a TypeError during padding, as you cannot pad a string tensor with an integer 0.

The previous implementation was correct. The default value for padding should be handled where the tensor's dtype is known, or be left as None for the underlying padding function to handle (which tf.ragged.to_tensor does correctly, using 0 for numeric types and "" for strings).

Suggested change
self.pad_value = pad_value or 0
self.pad_value = pad_value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant