-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
I'm trying to view results during training, by running cox-tensorboard --logdir OUTDIR --format-str param-{param}
, while training using the robustness
repopython -m robustness.main ...
, and I'm getting the following error, which seems to indicate that cox can't read while tensorboard is writing the logs:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 697, in open
self._handle = tables.open_file(self._path, self._mode, **kwargs)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/tables/file.py", line 315, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/tables/file.py", line 778, in __init__
self._g_new(filename, mode, **params)
File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5F.c", line 509, in H5Fopen
unable to open file
File "H5Fint.c", line 1400, in H5F__open
unable to open file
File "H5Fint.c", line 1615, in H5F_open
unable to lock the file
File "H5FD.c", line 1640, in H5FD_lock
driver lock request failed
File "H5FDsec2.c", line 941, in H5FD_sec2_lock
unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
End of HDF5 error back trace
Unable to open/create file '/home/ubuntu/logs/a69229d4-56c6-421d-bccb-73cc1d21b0d5/store.h5'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/tensorboard_view.py", line 59, in <module>
main()
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/tensorboard_view.py", line 24, in main
reader = CollectionReader(args.logdir)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/readers.py", line 53, in __init__
raise e
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/readers.py", line 42, in __init__
store = Store(self.directory, exp_id, new=False, mode='r')
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/store.py", line 90, in __init__
self.store = pd.HDFStore(os.path.join(exp_path, STORE_BASENAME), mode=mode)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 553, in __init__
self.open(mode=mode, **kwargs)
File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 729, in open
raise IOError(str(err)) from err
OSError: HDF5 error back trace
File "H5F.c", line 509, in H5Fopen
unable to open file
File "H5Fint.c", line 1400, in H5F__open
unable to open file
File "H5Fint.c", line 1615, in H5F_open
unable to lock the file
File "H5FD.c", line 1640, in H5FD_lock
driver lock request failed
File "H5FDsec2.c", line 941, in H5FD_sec2_lock
unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
End of HDF5 error back trace
Unable to open/create file '/home/ubuntu/logs/a69229d4-56c6-421d-bccb-73cc1d21b0d5/store.h5'
Is this expected behavior? It seems like it goes against the typical tensorboard
use cases. Using tensorboard directly doesn't allow to view which curve corresponds to which parameters. It would be nice to have read-only access to the tables, just to change the names of the curves in the tensorboard.
Metadata
Metadata
Assignees
Labels
No labels