Skip to content

Lack of Selective Adapter Loading in Phi-4-multimodal-instruct-onnx #1821

@adolfdaniel

Description

@adolfdaniel

Summary:
The model currently loads all multimodal adapters (image/audio/text) regardless of prompt content. This leads to unnecessary resource usage when the prompt contains only text.

Steps to Reproduce:

  • Initialize Phi-4-multimodal-instruct-onnx.
  • Submit a prompt containing only text (no image/audio).
  • Observe that all adapters are loaded during inference.

Expected Behavior:
Only the relevant adapter(s) should be loaded based on prompt content. For text-only prompts, image/audio adapters should be skipped.

Actual Behavior:
All adapters are loaded, even when not required, increasing memory footprint and initialization time.

Impact:

  • Inefficient resource utilization
  • Slower startup for text-only use cases
  • Potential compatibility issues on constrained runtimes

Suggested Fix:
Introduce a selective loading mechanism that inspects the prompt and loads only the necessary adapters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions