Possible Evaluation Error in val.py

I believe there is a slight error in the current validation score that may be slightly lowering the mAP IOU=0.5:0.95, so making the fix should give a slight raise to that main metric.


**Proof of error**
When cloning and running this code "python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65 --weights yolov5s.pt" command on the official repository, I added a print statement to print out the AP at each of the 10 IOU thresholds which gives:
[    0.54585     0.51816     0.49107     0.45745     0.42016      0.3753     0.31443     0.23312     0.13052    0.021785]

Now if I run the exact same command, but this time I change the iou threshold to be only 6 points, 0.7 - 0.95, I receive this output
                                                                           [    0.44055     0.38887     0.32283     0.23714     0.13167    0.021903]
If the code were correct, then the last 6 values of the 10 point and the values of the 6 point metric should match, rather we see a slightly higher value for the 6 point metric.

**Explanation of Error**
Lines 63-72 of val.py shows 

            ious, i = box_iou(predictions[pi, 0:4], labels[ti, 1:5]).max(1)  # best ious, indices
            detected_set = set()
            for j in (ious > iouv[0]).nonzero():
                d = ti[i[j]]  # detected label
                if d.item() not in detected_set:
                    detected_set.add(d.item())
                    detected.append(d)  # append detections
                    correct[pi[j]] = ious[j] > iouv  # iou_thres is 1xn
                    if len(detected) == nl:  # all labels already located in image
                        break

The code chooses all predictions above a certain AP for a given class and then iterates through them to record which detections they match with.  While this gives an accurate measurement for AP50, I believe this will choose matches in a random order. This means that if two predictions with IOU > 0.5 say 0.6 and 0.7 match both to the same target the 0.6 one could be chosen over the 0.7 one. Then considering that the target would be in detected_set, the 0.7 one would not replace. When testing mAP @ 0.5:0.95 the result may be slightly lower than what should actually be reported because of occasional lower IOU matches when there is actually a higher IOU prediction for that specific target.

I am not certain what the best way to go about fixing this would be, as I'm not too familiar with how it may integrate with the rest of the testing code. Also I'm not sure if this will affect official results as those seem to be evaluated using COCO tools?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Possible Evaluation Error in val.py #4251

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Possible Evaluation Error in val.py #4251

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions