-
Notifications
You must be signed in to change notification settings - Fork 226
Open
Description
Summary:
The model currently loads all multimodal adapters (image/audio/text) regardless of prompt content. This leads to unnecessary resource usage when the prompt contains only text.
Steps to Reproduce:
- Initialize Phi-4-multimodal-instruct-onnx.
- Submit a prompt containing only text (no image/audio).
- Observe that all adapters are loaded during inference.
Expected Behavior:
Only the relevant adapter(s) should be loaded based on prompt content. For text-only prompts, image/audio adapters should be skipped.
Actual Behavior:
All adapters are loaded, even when not required, increasing memory footprint and initialization time.
Impact:
- Inefficient resource utilization
- Slower startup for text-only use cases
- Potential compatibility issues on constrained runtimes
Suggested Fix:
Introduce a selective loading mechanism that inspects the prompt and loads only the necessary adapters.
Metadata
Metadata
Assignees
Labels
No labels