-
-
Notifications
You must be signed in to change notification settings - Fork 17.2k
Description
Study 🤔
I did a quick study to examine the effect of varying batch size on YOLOv5 trainings. The study trained YOLOv5s on COCO for 300 epochs with --batch-size
at 8 different values: [16, 20, 32, 40, 64, 80, 96, 128]
.
We've tried to make the train code batch-size agnostic, so that users get similar results at any batch size. This means users on a 11 GB 2080 Ti should be able to produce the same results as users on a 24 GB 3090 or a 40 GB A100, with smaller GPUs simply using smaller batch sizes.
We do this by scaling loss with batch size, and also by scaling weight decay with batch size. At batch sizes smaller than 64 we accumulate loss before optimizing, and at batch sizes above 64 we optimize after every batch.
Results 😃
Initial results vary significantly with batch size, but final results are nearly identical (good!).
Closeup of [email protected]:0.95:
One oddity that stood out is val objectness loss, which did vary with batch-size. I'm not sure why, as val-box and val-cls did not vary much, and neither did the 3 train losses. I don't know what this means or if there's any room for concern (or improvement).