Lack of Selective Adapter Loading in Phi-4-multimodal-instruct-onnx

**Summary:**
The model currently loads all multimodal adapters (image/audio/text) regardless of prompt content. This leads to unnecessary resource usage when the prompt contains only text.

**Steps to Reproduce:**
- Initialize Phi-4-multimodal-instruct-onnx.
- Submit a prompt containing only text (no image/audio).
- Observe that all adapters are loaded during inference.

**Expected Behavior:**
Only the relevant adapter(s) should be loaded based on prompt content. For text-only prompts, image/audio adapters should be skipped.

**Actual Behavior:**
All adapters are loaded, even when not required, increasing memory footprint and initialization time.

**Impact:**
- Inefficient resource utilization
- Slower startup for text-only use cases
- Potential compatibility issues on constrained runtimes

**Suggested Fix:**
Introduce a selective loading mechanism that inspects the prompt and loads only the necessary adapters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lack of Selective Adapter Loading in Phi-4-multimodal-instruct-onnx #1821

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lack of Selective Adapter Loading in Phi-4-multimodal-instruct-onnx #1821

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions