-
Notifications
You must be signed in to change notification settings - Fork 858
Closed
Labels
Description
Is your feature request related to a problem? Please describe.
Currently, there is no way to decide the key to be memoized when using preprocessor(memoize=True)
.
This leads to 2 issues :
- memoization can not be done for unhashable classes (typically a group of pandas rows). We need to wrap or subclass it.
- memoization key can not be specific to a preprocessing.
Example : We are trying to evaluate the reliability of a paragraph in a blog post.
We could evaluate the reliability of the paragraph and of the website.
The preprocessing corresponding to those 2 tasks will share the same key for memoize, which is not ideal : a website can have a few thousand paragraphs so we will evaluate website reliability a lot more than necessary.
Describe the solution you'd like
Being able to parametrize it in the decorator.
@preprocessor(memoize=True, memoize_key=lambda p: p.base_website_url)
def add_website_reliability(paragraph):
paragraph.website_reliability = evaluate_reliability(paragraph.base_website_url)
return paragraph
Additional context
Current workaroud.
@mock.patch("snorkel.map.core.get_hashable", lambda p: p.base_website_url)
@preprocessor(memoize=True)
def add_website_reliability(paragraph):
paragraph.website_reliability = evaluate_reliability(paragraph.base_website_url)
return paragraph