RuntimeError: CUDA error: device-side assert triggered

## 🐛 Bug
Hi! I am trying to train yolo into my own dataset.

It apparently runs the first epoch correctly but when it is going to evaluate the valid set, it fails, giving to me an error apparently  related with CUDA but when you see the logs it seems the problem is with the boxes in the `general.py` code.

At the beginning I thought the problem was I didn't have the last commit cloned, so I created a new virtualenv and cloned the last repo but the error was still there.

Then I modified the batch size to 2 and the error was the same. 

Could you help me to fix this issue?

## To Reproduce (REQUIRED)

Input:
```
python train.py --weights yolov5s.pt --cfg models/yolov5s.yaml --data my_dataset/data.yaml --epochs 300 --batch-size 16 --cache-images --workers 12 --project my_project/train/
```

Output:
```
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     0/299     4.75G   0.07158   0.05279   0.03193    0.1563        69       640: 100%|█| 867/867 [02:49<00:00,  5.11i
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95:   3%| | 3/111 [00:00<00:
Traceback (most recent call last):
  File "train.py", line 522, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 340, in train
    results, maps, times = test.test(opt.data,
  File "/home/henry/Projects/yolo/yolov5torch1.7/test.py", line 114, in test
    loss += compute_loss([x.float() for x in train_out], targets)[1][:3]  # box, obj, cls
  File "/home/henry/Projects/yolo/yolov5torch1.7/utils/loss.py", line 133, in __call__
    iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # iou(prediction, target)
  File "/home/henry/Projects/yolo/yolov5torch1.7/utils/general.py", line 272, in bbox_iou
    b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [100,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [101,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [13,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [50,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [88,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [89,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
```

## Environment

 - OS: Ubuntu 20.04
 - GPU RTX 2070 Super
 - CUDA 11.2


## 1st Update

I read to use `CUDA_LAUNCH_BLOCKING="1"` before `python train.py` in order to get the CUDA logs and these are the logs I am getting for:

```
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     0/299     4.75G   0.07159   0.05279   0.03194    0.1563        69       640: 100%|█| 867/867 [03:52<00:00,  3.74i
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95:   3%| | 3/111 [00:00<00:/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [100,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [101,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [13,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [50,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [88,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [0,0,0], thread: [89,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95:   3%| | 3/111 [00:00<00:
Traceback (most recent call last):
  File "train.py", line 522, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 340, in train
    results, maps, times = test.test(opt.data,
  File "/home/henry/Projects/yolo/yolov5torch1.7/test.py", line 114, in test
    loss += compute_loss([x.float() for x in train_out], targets)[1][:3]  # box, obj, cls
  File "/home/henry/Projects/yolo/yolov5torch1.7/utils/loss.py", line 142, in __call__
    t[range(n), tcls[i]] = self.cp
RuntimeError: CUDA error: device-side assert triggered
```


## 2nd Update

Due to the problem was while trying to evaluate the valid dataset, I added '-notest' to the command and now I don't receive any output, it seems it is still working, the memory of my GPU increased from 4GB to 6GB in this step but the percentage of use went to almost 0:
```
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     0/299     4.75G   0.07158    0.0528   0.03193    0.1563        69       640: 100%|█| 867/867 [03:41<00:00,  3.91i
```

![image](https://user-images.githubusercontent.com/17271049/106764727-5aa69380-6638-11eb-9af9-7368e35e135c.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RuntimeError: CUDA error: device-side assert triggered #2124

🐛 Bug

To Reproduce (REQUIRED)

Environment

1st Update

2nd Update

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RuntimeError: CUDA error: device-side assert triggered #2124

Description

🐛 Bug

To Reproduce (REQUIRED)

Environment

1st Update

2nd Update

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions