-
Notifications
You must be signed in to change notification settings - Fork 152
Open
Labels
🙋 help wantedExtra attention is neededExtra attention is needed
Description
如果在程序运行过程中出现网络中断的问题,程序会报错退出,导致实验中断。这导致即便网络恢复后,使用同步命令上传log,也无法获得完整的log日志(因为训练程序已经停了😭)。
测试代码如下:
import swanlab
import random
from time import sleep
# 创建一个SwanLab项目
swanlab.init(
# 设置项目名
project="my-awesome-project",
console=True,
# 设置超参数
config={
"learning_rate": 0.02,
"architecture": "CNN",
"dataset": "CIFAR-100",
"epochs": 10
}
)
# 模拟一次训练
epochs = 20
offset = random.random() / 5
for epoch in range(2, epochs):
acc = 1 - 2 ** -epoch - random.random() / epoch - offset
loss = 2 ** -epoch + random.random() / epoch + offset
sleep(2) # 模拟训练时间
# 记录训练指标
swanlab.log({"acc": acc, "loss": loss})
# [可选] 完成训练,这在notebook环境中是必要的
swanlab.finish()
结果
Metadata
Metadata
Assignees
Labels
🙋 help wantedExtra attention is neededExtra attention is needed