Skip to content

trt give random output value, diffs with onnxruntime #2928

@tpoisonooo

Description

@tpoisonooo

Description

I am using python tensorrt to convert onnx, the script not finish after 2 hours.

But for trtexec --onnx=model.onnx --fp16, it would stop normally and give me model.engine.

Environment

TensorRT Version: 8.6.1.6 GA, here is download url
NVIDIA GPU: GTX1660
NVIDIA Driver Version: 515.86.01
CUDA Version: cu117
CUDNN Version: 8.4.1
Operating System: ubuntu20.04
Python Version (if applicable): 3.9
Tensorflow Version (if applicable): -
PyTorch Version (if applicable): torch2.0
Baremetal or Container (if so, version):

Relevant Files

fp16 onnx model download here https://huggingface.co/tpoisonooo/alpaca.onnx/blob/fp16/decoder-merge-0.onnx
single script download here: https://github.com/tpoisonooo/llama.onnx/blob/add-trt-backend/tools/onnx-to-trt.py

Steps To Reproduce

  1. Download onnx and save it to onnx_model_dir
  2. Install python trt, run the script
$ python3 onnx-to-trt.py  onnx_model_dir   output_engine_dir

And this script would not finish

  1. But trtexec works
$ trtexec --onnx=/path/to/onnx_models/decoder-merge-0.onnx --fp16
$ ls
.. decoder.engine

Notes

This onnx is part of LLaMa huggingface format.

Since LLaMa needs cache and if opr here, I have to build a empty_tensor to hack it.

So past_key_in.min_shape is [1,32,0,128], it works on onnxruntime.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions