Reproducibility of two YOLOv5 identical train jobs

Hi @stark-t , I run two identical nano models on the Clara cluster and the results are a bit different.
Below you can look at the confusion matrices on the validation dataset. You also find the results.csv for each run at the bottom of this comment.

I personally do not like to see those differences in the two identical nano runs (but I can learn to accept it :D ). Not sure how to set a seed for yolov5 so that two runs of the same model are identical, or if that is even possible with the current configuration. Sadly we didn't see yet any parameters implemented with `argparse` to take a seed. There is a discussion here https://github.com/ultralytics/yolov5/issues/1222 pointing at PyTorch reproducibility issue https://pytorch.org/docs/stable/notes/randomness.html

The main takes are:

> Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

> The only method I'm aware of that might guarantee you identical results might be to train on CPU with --workers 0, but this is impractical naturally, so you simply need to adapt your workflow to accommodate minor variations in final model results.

# nano model n1
![confusion_matrix](https://user-images.githubusercontent.com/14074269/177938837-727990ce-e0ee-4aaa-80bd-fa5f2cf5142d.png)

# nano model n2
![confusion_matrix](https://user-images.githubusercontent.com/14074269/177938890-b2ed644d-467c-446b-8a50-8f522eaea227.png)

# small model s
![confusion_matrix](https://user-images.githubusercontent.com/14074269/177938918-004d805d-7b88-4a09-bdfb-d7c549a8bfee.png)

# Results csv files

nano model n1: [results.csv](https://github.com/stark-t/PAI/files/9069806/results.csv)

nano model n2: [results.csv](https://github.com/stark-t/PAI/files/9069810/results.csv)
 
small model s: [results.csv](https://github.com/stark-t/PAI/files/9069819/results.csv)
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility of two YOLOv5 identical train jobs #31

nano model n1

nano model n2

small model s

Results csv files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility of two YOLOv5 identical train jobs #31

Description

nano model n1

nano model n2

small model s

Results csv files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions