The problem of slow GPU inference speed in exporting onnx models

Dear Facebook Development Team
I saw this description in the guide: The converted model is able to run in either Python or C++ without detectron2/torchvision dependency, on CPU or GPUs. It has a runtime optimized for CPU & mobile inference, but not optimized for GPU inference.  After actual testing, I found that the CPU time is only one-third of the GPU time. Accelerated optimization will significantly improve inference speed. Therefore, I am wondering if it is possible to add optimization to the exported onnx GPU inference, as this is also a fundamental feature that is universal in the project.

![Image](https://github.com/user-attachments/assets/cccf054a-c50b-4b42-8e0f-fb2eebed77ca)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The problem of slow GPU inference speed in exporting onnx models #5459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The problem of slow GPU inference speed in exporting onnx models #5459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions