Skip to content

Tables aren't redefined for re-runs of UDF apply #536

@robbieculkin

Description

@robbieculkin

Description of the bug

As part of iterative development in a Jupyter environment, apply may be re-run several times. The developer might need to update candidates or create a new labeling function, for example.
When this happens, the corresponding Postgres table is cleared but not dropped. This means that the definition of the table cannot change to accommodate the updated parameters for apply.

To Reproduce

Steps to reproduce the behavior:

  1. Run the max_storage_temp_tutorial notebook in fonduer-tutorials, up to and including the Labeling Functions section.
  2. Add a new LF, doesn't need to do anything in particular (could return ABSTAIN every time). Add this to the stg_temp_lfs list.
  3. Re-run the remainder of cells in the section.

Upon calling LFAnalysis, the following exception is thrown:

ValueError: Number of LFs (7) and number of LF matrix columns (6) are different

Expected behavior

Underlying tables for a re-run of a UDF apply method should not only be cleared, but dropped.

Error Logs/Screenshots

Full stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-e005feee6300> in <module>
      5 sorted_lfs = sorted(lfs, key=lambda lf: lf.name)
      6 
----> 7 LFAnalysis(L=L_train[0], lfs=sorted_lfs).lf_summary(Y=L_gold_train[0].reshape(-1))

~/.venv/lib/python3.7/site-packages/snorkel/labeling/analysis.py in __init__(self, L, lfs)
     44             if len(lfs) != self._L_sparse.shape[1]:
     45                 raise ValueError(
---> 46                     f"Number of LFs ({len(lfs)}) and number of "
     47                     f"LF matrix columns ({self._L_sparse.shape[1]}) are different"
     48                 )

ValueError: Number of LFs (7) and number of LF matrix columns (6) are different

Environment (please complete the following information)

  • OS: Ubuntu 18.04
  • PostgreSQL Version: 12.1
  • Poppler Utils Version: 0.71.0-5
  • Fonduer Version: 0.8.3

Additional context

#263 (comment) advises restarting Python, but this does not appear to solve the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions