-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
ep:QNNissues related to QNN exeution providerissues related to QNN exeution providerfeature requestrequest for unsupported feature or enhancementrequest for unsupported feature or enhancementquantizationissues related to quantizationissues related to quantization
Description
Describe the feature request
DequantizeLinear's current implementation is naive - single thread and scalar instructions.
Could we prioritize a MT/vectorized implementation for this code path to match the MlasQuantizeLinearKernel implementation?
@fajin-corp already made some comments about this.
Describe scenario use case
We should see performance improvements to a multitude of use cases.
Recently, the QNN-EP made this code path the default for the execution provider for performance reasons as well. It's likely vectorization would help this effort even more.
https://github.com/microsoft/onnxruntime/releases/tag/v1.20.2
Another user recently commented also about performance gains with Qwen 2.5 0.5B model:
The thread has since become stale, so I cannot add onto it.
Metadata
Metadata
Assignees
Labels
ep:QNNissues related to QNN exeution providerissues related to QNN exeution providerfeature requestrequest for unsupported feature or enhancementrequest for unsupported feature or enhancementquantizationissues related to quantizationissues related to quantization