-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
Description of the bug
As part of iterative development in a Jupyter environment, apply
may be re-run several times. The developer might need to update candidates or create a new labeling function, for example.
When this happens, the corresponding Postgres table is cleared but not dropped. This means that the definition of the table cannot change to accommodate the updated parameters for apply
.
To Reproduce
Steps to reproduce the behavior:
- Run the max_storage_temp_tutorial notebook in fonduer-tutorials, up to and including the Labeling Functions section.
- Add a new LF, doesn't need to do anything in particular (could return ABSTAIN every time). Add this to the
stg_temp_lfs
list. - Re-run the remainder of cells in the section.
Upon calling LFAnalysis
, the following exception is thrown:
ValueError: Number of LFs (7) and number of LF matrix columns (6) are different
Expected behavior
Underlying tables for a re-run of a UDF apply
method should not only be cleared, but dropped.
Error Logs/Screenshots
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-e005feee6300> in <module>
5 sorted_lfs = sorted(lfs, key=lambda lf: lf.name)
6
----> 7 LFAnalysis(L=L_train[0], lfs=sorted_lfs).lf_summary(Y=L_gold_train[0].reshape(-1))
~/.venv/lib/python3.7/site-packages/snorkel/labeling/analysis.py in __init__(self, L, lfs)
44 if len(lfs) != self._L_sparse.shape[1]:
45 raise ValueError(
---> 46 f"Number of LFs ({len(lfs)}) and number of "
47 f"LF matrix columns ({self._L_sparse.shape[1]}) are different"
48 )
ValueError: Number of LFs (7) and number of LF matrix columns (6) are different
Environment (please complete the following information)
- OS: Ubuntu 18.04
- PostgreSQL Version: 12.1
- Poppler Utils Version: 0.71.0-5
- Fonduer Version: 0.8.3
Additional context
#263 (comment) advises restarting Python, but this does not appear to solve the problem.
Metadata
Metadata
Assignees
Labels
No labels