-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Description
I have been unable to export the model to onnx. I am using a sample image that is 1344 x 1344.
Interestingly if I set the RPN_NMS_THRESHOLD in the config to 0.00 it exports but the exported model gives trash output
aug = T.ResizeShortestEdge(
[1344, 1344], 1344
)
python export_model.py --sample-image ./lvis_sample_1344.jpg --config-file ../../configs/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml --export-method tracing --format onnx --output ./
python export_model.py --sample-image ./lvis_sample_1344.jpg --config-file ../../configs/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml --export-method tracing --format onnx --output ./
[08/21 12:09:18 detectron2]: Command line arguments: Namespace(format='onnx', export_method='tracing', config_file='../../configs/LVISv0.5-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml', sample_image='./lvis_sample_1344.jpg', run_eval=False, output='./', opts=[])
[W821 12:09:18.102378298 init.cpp:855] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 1) (function operator())
[08/21 12:09:19 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from detectron2://ImageNetPretrained/MSRA/R-50.pkl ...
[08/21 12:09:19 d2.checkpoint.c2_model_loading]: Renaming Caffe2 weights ......
[08/21 12:09:19 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone.bottom_up - Total num: 54
Some model parameters or buffers are not found in the checkpoint:
backbone.fpn_lateral2.{bias, weight}
backbone.fpn_lateral3.{bias, weight}
backbone.fpn_lateral4.{bias, weight}
backbone.fpn_lateral5.{bias, weight}
backbone.fpn_output2.{bias, weight}
backbone.fpn_output3.{bias, weight}
backbone.fpn_output4.{bias, weight}
backbone.fpn_output5.{bias, weight}
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_head.fc1.{bias, weight}
roi_heads.box_head.fc2.{bias, weight}
roi_heads.box_predictor.bbox_pred.{bias, weight}
roi_heads.box_predictor.cls_score.{bias, weight}
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.{bias, weight}
roi_heads.mask_head.mask_fcn2.{bias, weight}
roi_heads.mask_head.mask_fcn3.{bias, weight}
roi_heads.mask_head.mask_fcn4.{bias, weight}
roi_heads.mask_head.predictor.{bias, weight}
The checkpoint state_dict contains keys that are not used by the model:
fc1000.{bias, weight}
stem.conv1.bias
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/image_list.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert t.shape[:-2] == tensors[0].shape[:-2], t.shape
/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/functional.py:554: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4314.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/proposal_generator/proposal_utils.py:106: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not valid_mask.all():
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!"
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
h, w = box_size
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/layers/nms.py:17: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert boxes.shape[-1] == 4
/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/__init__.py:2150: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert condition, message
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/layers/roi_align.py:55: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert rois.dim() == 2 and rois.size(1) == 5
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py:138: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if not valid_mask.all():
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:151: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if tensor.numel() == 0:
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert tensor.dim() == 2 and tensor.size(-1) == 4, tensor.size()
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:191: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert torch.isfinite(self.tensor).all(), "Box tensor contains infinite or NaN!"
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/structures/boxes.py:192: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
h, w = box_size
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py:155: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if num_bbox_reg_classes == 1:
/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/layers/nms.py:17: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert boxes.shape[-1] == 4
Traceback (most recent call last):
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/export_model.py", line 247, in <module>
main() # pragma: no cover
^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/export_model.py", line 228, in main
exported_model = export_tracing(torch_model, sample_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/export_model.py", line 134, in export_tracing
torch.onnx.export(traceable_model, (image,), f, opset_version=STABLE_ONNX_OPSET_VERSION)
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/__init__.py", line 396, in export
export(
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/utils.py", line 529, in export
_export(
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/utils.py", line 1467, in _export
graph, params_dict, torch_out = _model_to_graph(
^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/utils.py", line 1087, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/utils.py", line 971, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/onnx/utils.py", line 878, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/jit/_trace.py", line 1501, in _get_trace_graph
outs = ONNXTracedModule(
^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/jit/_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1741, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/export/flatten.py", line 294, in forward
outputs = self.inference_func(self.model, *inputs_orig_format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/export_model.py", line 119, in inference
inst = model.inference(inputs, do_postprocess=False)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/meta_arch/rcnn.py", line 213, in inference
results, _ = self.roi_heads(images, features, proposals, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1741, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/roi_heads.py", line 747, in forward
pred_instances = self._forward_box(features, proposals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/roi_heads.py", line 815, in _forward_box
pred_instances, _ = self.box_predictor.inference(predictions, proposals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py", line 479, in inference
return fast_rcnn_inference(
^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py", line 79, in fast_rcnn_inference
result_per_image = [
^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py", line 80, in <listcomp>
fast_rcnn_inference_single_image(
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/modeling/roi_heads/fast_rcnn.py", line 162, in fast_rcnn_inference_single_image
keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/Development/***/Dependencies/detectron2/tools/deploy/../../detectron2/layers/nms.py", line 22, in batched_nms
return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torchvision/ops/boxes.py", line 76, in batched_nms
return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/jit/_trace.py", line 1448, in wrapper
return compiled_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torchvision/ops/boxes.py", line 95, in _batched_nms_coordinate_trick
offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
boxes_for_nms = boxes + offsets[:, None]
keep = nms(boxes_for_nms, scores, iou_threshold)
~~~ <--- HERE
return keep
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torchvision/ops/boxes.py", line 41, in nms
_log_api_usage_once(nms)
_assert_has_ops()
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: Trying to create tensor with negative dimension -2137697952: [-2137697952]
- please simplify the steps as much as possible so they do not require additional resources to
run, such as a private dataset.
Steps to reproduce:
- instaall detectron2 and dependencies IE torch etc
- Run export
Expected behavior:
I should be able to export to onnx
Environment:
Provide your environment information using the following command:
wget -nc -q https://github.com/facebookresearch/detectron2/raw/main/detectron2/utils/collect_env.py && python collect_env.py
sys.platform linux
Python 3.11.13 (main, Jun 5 2025, 13:12:00) [GCC 11.2.0]
numpy 2.3.0
detectron2 0.6 @/home//Development//detectron2/detectron2
Compiler GCC 11.2
CUDA compiler CUDA 12.9
detectron2 arch flags 5.0, 8.9
DETECTRON2_ENV_MODULE
PyTorch 2.7.1+cu126 @/home//miniconda3/envs/DtcService/lib/python3.11/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI True
GPU available Yes
GPU 0 NVIDIA GeForce RTX 4090 (arch=8.9)
GPU 1 Quadro K620 (arch=5.0)
Driver version 570.124.06
CUDA_HOME /home//miniconda3/envs/DtcService
Pillow 11.2.1
torchvision 0.22.1+cu126 @/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torchvision
torchvision arch flags 5.0, 6.0, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.11.0
PyTorch built with:
- GCC 11.2
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.6
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.5.1
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=e2d141dbde55c2a4370fac5165b0561b6af4798b, CUDA_VERSION=12.6, CUDNN_VERSION=9.5.1, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
Testing NCCL connectivity ... this should not hang.
W0821 12:11:08.805000 1505018 site-packages/torch/multiprocessing/spawn.py:169] Terminating process 1505054 via signal SIGTERM
Traceback (most recent call last):
File "/home//Development//Dependencies/detectron2/tools/deploy/collect_env.py", line 263, in
main() # pragma: no cover
^^^^^^
File "/home//Development//Dependencies/detectron2/tools/deploy/collect_env.py", line 259, in main
test_nccl_ops()
File "/home//Development//Dependencies/detectron2/tools/deploy/collect_env.py", line 226, in test_nccl_ops
mp.spawn(_test_nccl_worker, nprocs=num_gpu, args=(num_gpu, dist_url), daemon=False)
File "/home//miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 340, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home//miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 296, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "/home/***/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 215, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home//miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
fn(i, args)
File "/home//Development//Dependencies/detectron2/tools/deploy/collect_env.py", line 234, in _test_nccl_worker
dist.barrier(device_ids=[rank])
File "/home/*/miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home//miniconda3/envs/DtcService/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 4635, in barrier
work = group.barrier(opts=opts)
^^^^^^^^^^^^^^^^^^^^^^^^
torch.distributed.DistBackendError: NCCL error in: /pytorch/torch/csrc/distributed/c10d/NCCLUtils.cpp:77, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.26.2
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 'operation not supported'