Skip to content

Why is the TensorRT input binding set to FP16 for FP16 engines by default? #6777

@DavidBaldsiefen

Description

@DavidBaldsiefen

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

No response

Bug

Right now, when exporting a model to TensorRT using export.py with the --half argument, the input binding is set to datatype "half" while the output is set to datatype "float":

TensorRT: Network Description:
TensorRT:	input "images" with shape (1, 3, 640, 640) and dtype DataType.HALF
TensorRT:	output "output" with shape (1, 25200, 6) and dtype DataType.FLOAT

However, this means that whenever the resulting engine is used, its inputs also need to be converted to FP16, as can be seen in detect.py:

im = im.half() if half else im.float() # uint8 to fp16/32

Now I was wondering: is this intentional? TensorRT can handle an input in FP32 without problem even if the remaining engine is in half precision. However, having TensorRT handle the conversion means less overhead for the developer, while most likely also being faster as the conversion can be handled by the GPU instead of CPU. The datatype of the input binding when creating a new engine can be easily changed with a single line of code:

network.get_input(0).dtype = trt.DataType.FLOAT

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions