-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Description
I am using python tensorrt to convert onnx, the script not finish after 2 hours.
But for trtexec --onnx=model.onnx --fp16
, it would stop normally and give me model.engine
.
Environment
TensorRT Version: 8.6.1.6 GA, here is download url
NVIDIA GPU: GTX1660
NVIDIA Driver Version: 515.86.01
CUDA Version: cu117
CUDNN Version: 8.4.1
Operating System: ubuntu20.04
Python Version (if applicable): 3.9
Tensorflow Version (if applicable): -
PyTorch Version (if applicable): torch2.0
Baremetal or Container (if so, version):
Relevant Files
fp16 onnx model download here https://huggingface.co/tpoisonooo/alpaca.onnx/blob/fp16/decoder-merge-0.onnx
single script download here: https://github.com/tpoisonooo/llama.onnx/blob/add-trt-backend/tools/onnx-to-trt.py
Steps To Reproduce
- Download onnx and save it to
onnx_model_dir
- Install python trt, run the script
$ python3 onnx-to-trt.py onnx_model_dir output_engine_dir
And this script would not finish
- But
trtexec
works
$ trtexec --onnx=/path/to/onnx_models/decoder-merge-0.onnx --fp16
$ ls
.. decoder.engine
Notes
This onnx is part of LLaMa huggingface format.
Since LLaMa needs cache
and if
opr here, I have to build a empty_tensor to hack it.
So past_key_in.min_shape is [1,32,0,128]
, it works on onnxruntime.