Skip to content

Commit 378bde4

Browse files
thepycoderpre-commit-ci[bot]glenn-jocher
authored
ClearML experiment tracking integration (#8620)
* Add titles to matplotlib plots * Add ClearML Experiment Tracking integration. * Add ClearML Data Version Management automatic download when requested * Add ClearML Hyperparameter Optimization * ClearML save period integration * Fix wandb breaking when used with ClearML dataset * Fix wandb breaking when used with ClearML resume and dataset * Add ClearML documentation * fixed small bug in clearml integration that misreports epoch number * Final ClearMl additions before refactor * Add correct epoch reporting * Add remote execution and autoscaling docs for ClearML integration * Added images to clearml integration docs * fixed logo alignment bug and added hpo screenshot clearml * Fixed small epoch number bug in clearml integration * Remove saved model flush clearml * Cleanup clearml readme section * Cleaned up clearml logger docstring * Remove resume readme section clearml * Clearml integration cleanup * Updated ClearML documentation * Added dark vs light icons ClearML Readme * Clearml Readme styling * Add better gifs * Fixed gif file size * Add better images in tutorial notebook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Addressed comments in PR #8620 * Fixed circular import * Fixed circular import * Update tutorial.ipynb * Update tutorial.ipynb * Inline comment * Restructured tutorial notebook * Add correct ClearML link to README * Update tutorial.ipynb * Update general.py * Update __init__.py * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update __init__.py * Update README.md * Update __init__.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * spelling * Update tutorial.ipynb * notebook cutt.ly links * Update README.md * Update README.md * cutt.ly links in tutorial * Removed labels as they show up on last subplot only Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <[email protected]>
1 parent 2794483 commit 378bde4

File tree

13 files changed

+575
-21
lines changed

13 files changed

+575
-21
lines changed

README.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,8 @@ python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 12
151151
- [Train Custom Data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data)  🚀 RECOMMENDED
152152
- [Tips for Best Training Results](https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results)  ☘️
153153
RECOMMENDED
154-
- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)  🌟 NEW
154+
- [ClearML Logging](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) 🌟 NEW
155+
- [Weights & Biases Logging](https://github.com/ultralytics/yolov5/issues/1289)
155156
- [Roboflow for Datasets, Labeling, and Active Learning](https://github.com/ultralytics/yolov5/issues/4975)  🌟 NEW
156157
- [Multi-GPU Training](https://github.com/ultralytics/yolov5/issues/475)
157158
- [PyTorch Hub](https://github.com/ultralytics/yolov5/issues/36)  ⭐ NEW
@@ -190,17 +191,23 @@ Get started in seconds with our verified environments. Click each icon below for
190191
## <div align="center">Integrations</div>
191192

192193
<div align="center">
193-
<a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
194-
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="49%"/>
194+
<a href="https://cutt.ly/yolov5-readme-clearml#gh-light-mode-only">
195+
<img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github.png#gh-light-mode-only" width="32%" />
196+
</a>
197+
<a href="https://cutt.ly/yolov5-readme-clearml#gh-dark-mode-only">
198+
<img src="https://github.com/thepycoder/clearml_screenshots/raw/main/banner_github_light.png#gh-dark-mode-only" width="32%" />
195199
</a>
196200
<a href="https://roboflow.com/?ref=ultralytics">
197-
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="49%"/>
201+
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow-long.png" width="33%"/>
202+
</a>
203+
<a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
204+
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb-long.png" width="33%"/>
198205
</a>
199206
</div>
200207

201-
|Weights and Biases|Roboflow ⭐ NEW|
202-
|:-:|:-:|
203-
|Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |
208+
|ClearML ⭐ NEW|Roboflow|Weights and Biases
209+
|:-:|:-:|:-:|
210+
|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)
204211

205212
<!-- ## <div align="center">Compete and Win</div>
206213

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ protobuf<=3.20.1 # https://github.com/ultralytics/yolov5/issues/8012
1717
# Logging -------------------------------------
1818
tensorboard>=2.4.1
1919
# wandb
20+
# clearml
2021

2122
# Plotting ------------------------------------
2223
pandas>=1.1.4

train.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,8 @@ def train(hyp, opt, device, callbacks): # hyp is path/to/hyp.yaml or hyp dictio
9090
data_dict = None
9191
if RANK in {-1, 0}:
9292
loggers = Loggers(save_dir, weights, opt, hyp, LOGGER) # loggers instance
93+
if loggers.clearml:
94+
data_dict = loggers.clearml.data_dict # None if no ClearML dataset or filled in by ClearML
9395
if loggers.wandb:
9496
data_dict = loggers.wandb.data_dict
9597
if resume:

tutorial.ipynb

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
"provenance": [],
88
"collapsed_sections": [],
99
"machine_shape": "hm",
10+
"toc_visible": true,
1011
"include_colab_link": true
1112
},
1213
"kernelspec": {
@@ -913,6 +914,30 @@
913914
"# 4. Visualize"
914915
]
915916
},
917+
{
918+
"cell_type": "markdown",
919+
"source": [
920+
"## ClearML Logging and Automation 🌟 NEW\n",
921+
"\n",
922+
"[ClearML](https://cutt.ly/yolov5-notebook-clearml) is completely integrated into YOLOv5 to track your experimentation, manage dataset versions and even remotely execute training runs.\n",
923+
"\n",
924+
"To enable ClearML (Check cells above):\n",
925+
"- `pip install clearml`\n",
926+
"- run `clearml-init` to connect to a ClearML server (**deploy your own open-source server [here](https://github.com/allegroai/clearml-server)**, or use our free hosted server [here](https://cutt.ly/yolov5-notebook-clearml))\n",
927+
"\n",
928+
"You'll get all the great expected features from an experiment manager: live updates, model upload, experiment comparison etc. but ClearML also tracks uncommitted changes and installed packages for example. Thanks to that ClearML Tasks (which is what we call experiments) are also reproducible on different machines! With only 1 extra line, we can schedule a YOLOv5 training task on a queue to be executed by any number of ClearML Agents (workers).\n",
929+
"\n",
930+
"You can use ClearML Data to version your dataset and then pass it to YOLOv5 simply using its unique ID. This will help you keep track of your data without adding extra hassle. \n",
931+
"\n",
932+
"Explore the [ClearML Tutorial](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml) for more info!\n",
933+
"\n",
934+
"<a href=\"https://cutt.ly/yolov5-notebook-clearml\">\n",
935+
"<img alt=\"ClearML Experiment Management UI\" src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/scalars.jpg\" width=\"1280\"/></a>"
936+
],
937+
"metadata": {
938+
"id": "Lay2WsTjNJzP"
939+
}
940+
},
916941
{
917942
"cell_type": "markdown",
918943
"metadata": {
@@ -1105,4 +1130,4 @@
11051130
"outputs": []
11061131
}
11071132
]
1108-
}
1133+
}

utils/general.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import re
1515
import shutil
1616
import signal
17+
import sys
1718
import threading
1819
import time
1920
import urllib
@@ -449,6 +450,9 @@ def check_file(file, suffix=''):
449450
torch.hub.download_url_to_file(url, file)
450451
assert Path(file).exists() and Path(file).stat().st_size > 0, f'File download failed: {url}' # check
451452
return file
453+
elif file.startswith('clearml://'): # ClearML Dataset ID
454+
assert 'clearml' in sys.modules, "ClearML is not installed, so cannot use ClearML dataset. Try running 'pip install clearml'."
455+
return file
452456
else: # search
453457
files = []
454458
for d in 'data', 'models', 'utils': # search directories

utils/loggers/__init__.py

Lines changed: 59 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@
1111
from torch.utils.tensorboard import SummaryWriter
1212

1313
from utils.general import colorstr, cv2, emojis
14+
from utils.loggers.clearml.clearml_utils import ClearmlLogger
1415
from utils.loggers.wandb.wandb_utils import WandbLogger
1516
from utils.plots import plot_images, plot_results
1617
from utils.torch_utils import de_parallel
1718

18-
LOGGERS = ('csv', 'tb', 'wandb') # text-file, TensorBoard, Weights & Biases
19+
LOGGERS = ('csv', 'tb', 'wandb', 'clearml') # *.csv, TensorBoard, Weights & Biases, ClearML
1920
RANK = int(os.getenv('RANK', -1))
2021

2122
try:
@@ -32,6 +33,13 @@
3233
except (ImportError, AssertionError):
3334
wandb = None
3435

36+
try:
37+
import clearml
38+
39+
assert hasattr(clearml, '__version__') # verify package import not local dir
40+
except (ImportError, AssertionError):
41+
clearml = None
42+
3543

3644
class Loggers():
3745
# YOLOv5 Loggers class
@@ -61,10 +69,14 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
6169
setattr(self, k, None) # init empty logger dictionary
6270
self.csv = True # always log to csv
6371

64-
# Message
72+
# Messages
6573
if not wandb:
6674
prefix = colorstr('Weights & Biases: ')
67-
s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)"
75+
s = f"{prefix}run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs in Weights & Biases"
76+
self.logger.info(emojis(s))
77+
if not clearml:
78+
prefix = colorstr('ClearML: ')
79+
s = f"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 runs in ClearML"
6880
self.logger.info(emojis(s))
6981

7082
# TensorBoard
@@ -82,12 +94,17 @@ def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None,
8294
self.wandb = WandbLogger(self.opt, run_id)
8395
# temp warn. because nested artifacts not supported after 0.12.10
8496
if pkg.parse_version(wandb.__version__) >= pkg.parse_version('0.12.11'):
85-
self.logger.warning(
86-
"YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
87-
)
97+
s = "YOLOv5 temporarily requires wandb version 0.12.10 or below. Some features may not work as expected."
98+
self.logger.warning(s)
8899
else:
89100
self.wandb = None
90101

102+
# ClearML
103+
if clearml and 'clearml' in self.include:
104+
self.clearml = ClearmlLogger(self.opt, self.hyp)
105+
else:
106+
self.clearml = None
107+
91108
def on_train_start(self):
92109
# Callback runs on train start
93110
pass
@@ -97,9 +114,12 @@ def on_pretrain_routine_end(self):
97114
paths = self.save_dir.glob('*labels*.jpg') # training labels
98115
if self.wandb:
99116
self.wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in paths]})
117+
if self.clearml:
118+
pass # ClearML saves these images automatically using hooks
100119

101120
def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
102121
# Callback runs on train batch end
122+
# ni: number integrated batches (since train start)
103123
if plots:
104124
if ni == 0:
105125
if self.tb and not self.opt.sync_bn: # --sync known issue https://github.com/ultralytics/yolov5/issues/3754
@@ -109,9 +129,12 @@ def on_train_batch_end(self, ni, model, imgs, targets, paths, plots):
109129
if ni < 3:
110130
f = self.save_dir / f'train_batch{ni}.jpg' # filename
111131
plot_images(imgs, targets, paths, f)
112-
if self.wandb and ni == 10:
132+
if (self.wandb or self.clearml) and ni == 10:
113133
files = sorted(self.save_dir.glob('train*.jpg'))
114-
self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
134+
if self.wandb:
135+
self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})
136+
if self.clearml:
137+
self.clearml.log_debug_samples(files, title='Mosaics')
115138

116139
def on_train_epoch_end(self, epoch):
117140
# Callback runs on train epoch end
@@ -122,12 +145,17 @@ def on_val_image_end(self, pred, predn, path, names, im):
122145
# Callback runs on val image end
123146
if self.wandb:
124147
self.wandb.val_one_image(pred, predn, path, names, im)
148+
if self.clearml:
149+
self.clearml.log_image_with_boxes(path, pred, names, im)
125150

126151
def on_val_end(self):
127152
# Callback runs on val end
128-
if self.wandb:
153+
if self.wandb or self.clearml:
129154
files = sorted(self.save_dir.glob('val*.jpg'))
130-
self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
155+
if self.wandb:
156+
self.wandb.log({"Validation": [wandb.Image(str(f), caption=f.name) for f in files]})
157+
if self.clearml:
158+
self.clearml.log_debug_samples(files, title='Validation')
131159

132160
def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
133161
# Callback runs at the end of each fit (train+val) epoch
@@ -142,6 +170,10 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
142170
if self.tb:
143171
for k, v in x.items():
144172
self.tb.add_scalar(k, v, epoch)
173+
elif self.clearml: # log to ClearML if TensorBoard not used
174+
for k, v in x.items():
175+
title, series = k.split('/')
176+
self.clearml.task.get_logger().report_scalar(title, series, v, epoch)
145177

146178
if self.wandb:
147179
if best_fitness == fi:
@@ -151,12 +183,22 @@ def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
151183
self.wandb.log(x)
152184
self.wandb.end_epoch(best_result=best_fitness == fi)
153185

186+
if self.clearml:
187+
self.clearml.current_epoch_logged_images = set() # reset epoch image limit
188+
self.clearml.current_epoch += 1
189+
154190
def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
155191
# Callback runs on model save event
156192
if self.wandb:
157193
if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
158194
self.wandb.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)
159195

196+
if self.clearml:
197+
if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
198+
self.clearml.task.update_output_model(model_path=str(last),
199+
model_name='Latest Model',
200+
auto_delete_file=False)
201+
160202
def on_train_end(self, last, best, plots, epoch, results):
161203
# Callback runs on training end
162204
if plots:
@@ -165,7 +207,7 @@ def on_train_end(self, last, best, plots, epoch, results):
165207
files = [(self.save_dir / f) for f in files if (self.save_dir / f).exists()] # filter
166208
self.logger.info(f"Results saved to {colorstr('bold', self.save_dir)}")
167209

168-
if self.tb:
210+
if self.tb and not self.clearml: # These images are already captured by ClearML by now, we don't want doubles
169211
for f in files:
170212
self.tb.add_image(f.stem, cv2.imread(str(f))[..., ::-1], epoch, dataformats='HWC')
171213

@@ -180,6 +222,12 @@ def on_train_end(self, last, best, plots, epoch, results):
180222
aliases=['latest', 'best', 'stripped'])
181223
self.wandb.finish_run()
182224

225+
if self.clearml:
226+
# Save the best model here
227+
if not self.opt.evolve:
228+
self.clearml.task.update_output_model(model_path=str(best if best.exists() else last),
229+
name='Best Model')
230+
183231
def on_params_update(self, params):
184232
# Update hyperparams or configs of the experiment
185233
# params: A dict containing {param: value} pairs

0 commit comments

Comments
 (0)