-
-
Notifications
You must be signed in to change notification settings - Fork 17.3k
Description
I believe there is a slight error in the current validation score that may be slightly lowering the mAP IOU=0.5:0.95, so making the fix should give a slight raise to that main metric.
Proof of error
When cloning and running this code "python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65 --weights yolov5s.pt" command on the official repository, I added a print statement to print out the AP at each of the 10 IOU thresholds which gives:
[ 0.54585 0.51816 0.49107 0.45745 0.42016 0.3753 0.31443 0.23312 0.13052 0.021785]
Now if I run the exact same command, but this time I change the iou threshold to be only 6 points, 0.7 - 0.95, I receive this output
[ 0.44055 0.38887 0.32283 0.23714 0.13167 0.021903]
If the code were correct, then the last 6 values of the 10 point and the values of the 6 point metric should match, rather we see a slightly higher value for the 6 point metric.
Explanation of Error
Lines 63-72 of val.py shows
ious, i = box_iou(predictions[pi, 0:4], labels[ti, 1:5]).max(1) # best ious, indices
detected_set = set()
for j in (ious > iouv[0]).nonzero():
d = ti[i[j]] # detected label
if d.item() not in detected_set:
detected_set.add(d.item())
detected.append(d) # append detections
correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn
if len(detected) == nl: # all labels already located in image
break
The code chooses all predictions above a certain AP for a given class and then iterates through them to record which detections they match with. While this gives an accurate measurement for AP50, I believe this will choose matches in a random order. This means that if two predictions with IOU > 0.5 say 0.6 and 0.7 match both to the same target the 0.6 one could be chosen over the 0.7 one. Then considering that the target would be in detected_set, the 0.7 one would not replace. When testing mAP @ 0.5:0.95 the result may be slightly lower than what should actually be reported because of occasional lower IOU matches when there is actually a higher IOU prediction for that specific target.
I am not certain what the best way to go about fixing this would be, as I'm not too familiar with how it may integrate with the rest of the testing code. Also I'm not sure if this will affect official results as those seem to be evaluated using COCO tools?