Skip to content

The problem of slow GPU inference speed in exporting onnx models #5459

@Doctor-bun

Description

@Doctor-bun

Dear Facebook Development Team
I saw this description in the guide: The converted model is able to run in either Python or C++ without detectron2/torchvision dependency, on CPU or GPUs. It has a runtime optimized for CPU & mobile inference, but not optimized for GPU inference. After actual testing, I found that the CPU time is only one-third of the GPU time. Accelerated optimization will significantly improve inference speed. Therefore, I am wondering if it is possible to add optimization to the exported onnx GPU inference, as this is also a fundamental feature that is universal in the project.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovements or good new features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions