Skip to content

Reproducibility of two YOLOv5 identical train jobs #31

@valentinitnelav

Description

@valentinitnelav

Hi @stark-t , I run two identical nano models on the Clara cluster and the results are a bit different.
Below you can look at the confusion matrices on the validation dataset. You also find the results.csv for each run at the bottom of this comment.

I personally do not like to see those differences in the two identical nano runs (but I can learn to accept it :D ). Not sure how to set a seed for yolov5 so that two runs of the same model are identical, or if that is even possible with the current configuration. Sadly we didn't see yet any parameters implemented with argparse to take a seed. There is a discussion here ultralytics/yolov5#1222 pointing at PyTorch reproducibility issue https://pytorch.org/docs/stable/notes/randomness.html

The main takes are:

Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

The only method I'm aware of that might guarantee you identical results might be to train on CPU with --workers 0, but this is impractical naturally, so you simply need to adapt your workflow to accommodate minor variations in final model results.

nano model n1

confusion_matrix

nano model n2

confusion_matrix

small model s

confusion_matrix

Results csv files

nano model n1: results.csv

nano model n2: results.csv

small model s: results.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions