export failure for tflite with options (--half and --int8)

### Search before asking

- [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar bug report.


### YOLOv5 Component

Export

### Bug

### 1. export failed when use gpu device for --half option 

` python export.py --half --weights ./export_tflite/yolov5n/best.pt --include tflite --device 0`

LOG

> export: data=data/coco128.yaml, weights=./export_tflite/yolov5n/best.pt, imgsz=[640, 640], batch_size=1, device=0, half=True, inplace=False, train=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=13, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['tflite']
> YOLOv5 🚀 v6.0-42-g4c0982a torch 1.9.0+cu102 CUDA:0 (Quadro RTX 4000, 7982.3125MB)
> 
> Fusing layers... 
> Model Summary: 213 layers, 1764577 parameters, 0 gradients, 4.2 GFLOPs
> 
> PyTorch: starting from export_tflite/yolov5n/best.pt (14.7 MB)
> 2021-11-02 13:03:19.661502: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 
> TensorFlow saved_model: starting export with tensorflow 2.4.1...
> 
>                  from  n    params  module                                  arguments                     
> 
> TensorFlow saved_model: export failure: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
> 
> TensorFlow Lite: starting export with tensorflow 2.4.1...
> 
> TensorFlow Lite: export failure: 'NoneType' object has no attribute 'call'
> 
> 





### 2. export failed when use --int8 option 

`python export.py --int8 --weights ./export_tflite/yolov5n/best.pt --include tflite`

LOG

> export: data=data/coco128.yaml, weights=./export_tflite/yolov5n/best.pt, imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, optimize=False, int8=True, dynamic=False, simplify=False, opset=13, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['tflite']
> YOLOv5 🚀 v6.0-42-g4c0982a torch 1.9.0+cu102 CPU
> 
> Fusing layers... 
> Model Summary: 213 layers, 1764577 parameters, 0 gradients, 4.2 GFLOPs
> 
> PyTorch: starting from export_tflite/yolov5n/best.pt (14.7 MB)
> 2021-11-02 13:08:43.302145: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 
> TensorFlow saved_model: starting export with tensorflow 2.4.1...
> 
>                  from  n    params  module                                  arguments                     
> 2021-11-02 13:08:44.301976: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 2021-11-02 13:08:44.303109: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
> 2021-11-02 13:08:44.334274: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
> 2021-11-02 13:08:44.334350: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: 19367d16ac94
> 2021-11-02 13:08:44.334358: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: 19367d16ac94
> 2021-11-02 13:08:44.335550: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
> 2021-11-02 13:08:44.335585: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
> 2021-11-02 13:08:44.335591: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
> 2021-11-02 13:08:44.336461: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
> To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
> 2021-11-02 13:08:44.338266: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
>   0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]              
>   1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]                
>   2                -1  1      4800  models.common.C3                        [32, 32, 1]                   
>   3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
>   4                -1  1     29184  models.common.C3                        [64, 64, 2]                   
>   5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
>   6                -1  1    156928  models.common.C3                        [128, 128, 3]                 
>   7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
>   8                -1  1    296448  models.common.C3                        [256, 256, 1]                 
>   9                -1  1    164608  models.common.SPPF                      [256, 256, 5]                 
>  10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
>  11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
>  12           [-1, 6]  1         0  models.common.Concat                    [1]                           
>  13                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
>  14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
>  15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
>  16           [-1, 4]  1         0  models.common.Concat                    [1]                           
>  17                -1  1     22912  models.common.C3                        [128, 64, 1, False]           
>  18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]                
>  19          [-1, 14]  1         0  models.common.Concat                    [1]                           
>  20                -1  1     74496  models.common.C3                        [128, 128, 1, False]          
>  21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
>  22          [-1, 10]  1         0  models.common.Concat                    [1]                           
>  23                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
>  24      [17, 20, 23]  1     12177  models.yolo.Detect                      [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256], [640, 640]]
> Model: "model"
> __________________________________________________________________________________________________
> Layer (type)                    Output Shape         Param #     Connected to                     
> ==================================================================================================
> input_1 (InputLayer)            [(1, 640, 640, 3)]   0                                            
> __________________________________________________________________________________________________
> tf_conv (TFConv)                (1, 320, 320, 16)    1744        input_1[0][0]                    
> __________________________________________________________________________________________________
> tf_conv_1 (TFConv)              (1, 160, 160, 32)    4640        tf_conv[0][0]                    
> __________________________________________________________________________________________________
> tf_c3 (TFC3)                    (1, 160, 160, 32)    4704        tf_conv_1[0][0]                  
> __________________________________________________________________________________________________
> tf_conv_7 (TFConv)              (1, 80, 80, 64)      18496       tf_c3[0][0]                      
> __________________________________________________________________________________________________
> tf_c3_1 (TFC3)                  (1, 80, 80, 64)      28928       tf_conv_7[0][0]                  
> __________________________________________________________________________________________________
> tf_conv_15 (TFConv)             (1, 40, 40, 128)     73856       tf_c3_1[0][0]                    
> __________________________________________________________________________________________________
> tf_c3_2 (TFC3)                  (1, 40, 40, 128)     156288      tf_conv_15[0][0]                 
> __________________________________________________________________________________________________
> tf_conv_25 (TFConv)             (1, 20, 20, 256)     295168      tf_c3_2[0][0]                    
> __________________________________________________________________________________________________
> tf_c3_3 (TFC3)                  (1, 20, 20, 256)     295680      tf_conv_25[0][0]                 
> __________________________________________________________________________________________________
> tfsppf (TFSPPF)                 (1, 20, 20, 256)     164224      tf_c3_3[0][0]                    
> __________________________________________________________________________________________________
> tf_conv_33 (TFConv)             (1, 20, 20, 128)     32896       tfsppf[0][0]                     
> __________________________________________________________________________________________________
> tf_upsample (TFUpsample)        (1, 40, 40, 128)     0           tf_conv_33[0][0]                 
> __________________________________________________________________________________________________
> tf_concat (TFConcat)            (1, 40, 40, 256)     0           tf_upsample[0][0]                
>                                                                  tf_c3_2[0][0]                    
> __________________________________________________________________________________________________
> tf_c3_4 (TFC3)                  (1, 40, 40, 128)     90496       tf_concat[0][0]                  
> __________________________________________________________________________________________________
> tf_conv_39 (TFConv)             (1, 40, 40, 64)      8256        tf_c3_4[0][0]                    
> __________________________________________________________________________________________________
> tf_upsample_1 (TFUpsample)      (1, 80, 80, 64)      0           tf_conv_39[0][0]                 
> __________________________________________________________________________________________________
> tf_concat_1 (TFConcat)          (1, 80, 80, 128)     0           tf_upsample_1[0][0]              
>                                                                  tf_c3_1[0][0]                    
> __________________________________________________________________________________________________
> tf_c3_5 (TFC3)                  (1, 80, 80, 64)      22720       tf_concat_1[0][0]                
> __________________________________________________________________________________________________
> tf_conv_45 (TFConv)             (1, 40, 40, 64)      36928       tf_c3_5[0][0]                    
> __________________________________________________________________________________________________
> tf_concat_2 (TFConcat)          (1, 40, 40, 128)     0           tf_conv_45[0][0]                 
>                                                                  tf_conv_39[0][0]                 
> __________________________________________________________________________________________________
> tf_c3_6 (TFC3)                  (1, 40, 40, 128)     74112       tf_concat_2[0][0]                
> __________________________________________________________________________________________________
> tf_conv_51 (TFConv)             (1, 20, 20, 128)     147584      tf_c3_6[0][0]                    
> __________________________________________________________________________________________________
> tf_concat_3 (TFConcat)          (1, 20, 20, 256)     0           tf_conv_51[0][0]                 
>                                                                  tf_conv_33[0][0]                 
> __________________________________________________________________________________________________
> tf_c3_7 (TFC3)                  (1, 20, 20, 256)     295680      tf_concat_3[0][0]                
> __________________________________________________________________________________________________
> tf_detect (TFDetect)            ((1, 25200, 9), [(1, 12177       tf_c3_5[0][0]                    
>                                                                  tf_c3_6[0][0]                    
>                                                                  tf_c3_7[0][0]                    
> ==================================================================================================
> Total params: 1,764,577
> Trainable params: 0
> Non-trainable params: 1,764,577
> __________________________________________________________________________________________________
> 2021-11-02 13:08:48.863933: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
> Found untraced functions such as tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_2_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_4_layer_call_and_return_conditional_losses while saving (showing 5 of 520). These functions will not be directly callable after loading.
> Found untraced functions such as tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_2_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_4_layer_call_and_return_conditional_losses while saving (showing 5 of 520). These functions will not be directly callable after loading.
> Assets written to: export_tflite/yolov5n/best_saved_model/assets
> TensorFlow saved_model: export success, saved as export_tflite/yolov5n/best_saved_model (62.2 MB)
> 
> TensorFlow Lite: starting export with tensorflow 2.4.1...
> Found untraced functions such as tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_2_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_4_layer_call_and_return_conditional_losses while saving (showing 5 of 520). These functions will not be directly callable after loading.
> Found untraced functions such as tf_conv_2_layer_call_and_return_conditional_losses, tf_conv_2_layer_call_fn, tf_conv_3_layer_call_and_return_conditional_losses, tf_conv_3_layer_call_fn, tf_conv_4_layer_call_and_return_conditional_losses while saving (showing 5 of 520). These functions will not be directly callable after loading.
> Assets written to: /tmp/tmpibgq2dv_/assets
> 2021-11-02 13:09:13.579602: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
> 2021-11-02 13:09:13.579786: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
> 2021-11-02 13:09:13.580027: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 2021-11-02 13:09:13.605233: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3300000000 Hz
> 2021-11-02 13:09:13.622811: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:928] Optimization results for grappler item: graph_to_optimize
>   function_optimizer: function_optimizer did nothing. time = 0.006ms.
>   function_optimizer: function_optimizer did nothing. time = 0ms.
> 
> 2021-11-02 13:09:14.271508: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored output_format.
> 2021-11-02 13:09:14.271571: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:319] Ignored drop_control_dependency.
> 2021-11-02 13:09:14.311455: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
> 
> TensorFlow Lite: export failure: too many values to unpack (expected 4)
> 
> 

### Environment

1. YOLOv5 🚀 v6.0-42-g4c0982a torch 1.9.0+cu102 CUDA:0 (Quadro RTX 4000, 7982.3125MB)
2. YOLOv5 🚀 v6.0-42-g4c0982a torch 1.9.0+cu102 CPU
3. Ubuntu 18.04.5 LTS
4. python 3.8.8


### Minimal Reproducible Example

_No response_

### Additional

_No response_

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

export failure for tflite with options (--half and --int8) #5446

Search before asking

YOLOv5 Component

Bug

1. export failed when use gpu device for --half option

2. export failed when use --int8 option

Layer (type) Output Shape Param # Connected to

tf_detect (TFDetect) ((1, 25200, 9), [(1, 12177 tf_c3_5[0][0]
tf_c3_6[0][0]
tf_c3_7[0][0]

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

export failure for tflite with options (--half and --int8) #5446

Description

Search before asking

YOLOv5 Component

Bug

1. export failed when use gpu device for --half option

2. export failed when use --int8 option

Layer (type) Output Shape Param # Connected to

tf_detect (TFDetect) ((1, 25200, 9), [(1, 12177 tf_c3_5[0][0] tf_c3_6[0][0] tf_c3_7[0][0]

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

tf_detect (TFDetect) ((1, 25200, 9), [(1, 12177 tf_c3_5[0][0]
tf_c3_6[0][0]
tf_c3_7[0][0]