-
-
Couldn't load subscription status.
- Fork 17.3k
Description
Search before asking
- I have searched the YOLOv5 issues and found no similar bug report.
YOLOv5 Component
Training
Bug
Due to GPU time restrictions, I train until it cuts off and then relaunch the training with the --resume option.
Until yesterday it worked perfectly, but today I get this error whatever the checkpoint with which I relaunch the training. I have tried with some old ones that I know worked, and the error is the same. Has anything changed in the optimizer structure?
Environment
Using torch 1.10.0+cu111 (Tesla K80)
Google Colab
Minimal Reproducible Example
!python train.py --img 1280 --batch 16 --epochs 50 --data /content/drive/MyDrive/OIv6/dataset.yaml --project /content/drive/MyDrive/OIv6/runs/train --weights yolov5s6.pt --hyp hyp.VOC.yaml --optimizer AdamW --device 0
!python train.py --resume /content/drive/MyDrive/OIv6/runs/train/exp/weights/last.pt --device 0
Traceback (most recent call last):
File "train.py", line 667, in
main(opt)
File "train.py", line 562, in main
train(opt.hyp, opt, device, callbacks)
File "train.py", line 191, in train
optimizer.load_state_dict(ckpt['optimizer'])
File "/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py", line 146, in load_state_dict
raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!