-
-
Notifications
You must be signed in to change notification settings - Fork 17.2k
Description
Search before asking
- I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Hi,
I usually use your coco pre-trained yolov5-x model to test on my datasets to check the prediction/test performance. These datasets usually have images/video frames of high resolution usually > 4K and even > 8K, sometimes even 240 pixels, and contains usually small objects.
When I increase the input resolution to the native resolution in some cases "it looks visually" (not tested on AP score) that is able to detect almost all the objects I need to detect, which is good news. But I want to have a better understanding if this is a good habit, or if I actually should tile the frames to the model's native training input resolution of 640x640?
My quick thoughts are that the model multi-scale prediction grid is expanded, and becomes less fine-grained, and thus harder to detect small objects or objects that are densely packed, and thus should not be the way to go?, and therefore should I better tile the input images in the native trained input resolution? If this is correct or not, could you provide a better (theoretical) explanation?
I am focusing mainly on datasets that contain daily small images of area sizes in the range of 8-32 pixels (In the huge images). resizing/subsampling those images to the native training resolution will almost remove the small images. Therefore the images should be processed as much as possible on the native resolution (if you agree with this).
What would be the best training strategies?
- Tiled training (native training resolution of 640640), and tiled prediction?
- Upscale the model input to dataset images (e.g. 4k, 8k, etc.) and train a model on the resolution (eventually change the grinds settings to have a more finely grid?), prediction with the full resolution?
- Tiled training, but full native dataset resolution prediction?
- Other better suggestions?