Skip to content

Choose a memoization key in preprocessor(memoize=True) #1561

@Wirg

Description

@Wirg

Is your feature request related to a problem? Please describe.

Currently, there is no way to decide the key to be memoized when using preprocessor(memoize=True).
This leads to 2 issues :

  • memoization can not be done for unhashable classes (typically a group of pandas rows). We need to wrap or subclass it.
  • memoization key can not be specific to a preprocessing.
    Example : We are trying to evaluate the reliability of a paragraph in a blog post.
    We could evaluate the reliability of the paragraph and of the website.
    The preprocessing corresponding to those 2 tasks will share the same key for memoize, which is not ideal : a website can have a few thousand paragraphs so we will evaluate website reliability a lot more than necessary.

Describe the solution you'd like

Being able to parametrize it in the decorator.

@preprocessor(memoize=True, memoize_key=lambda p: p.base_website_url)
def add_website_reliability(paragraph):
    paragraph.website_reliability = evaluate_reliability(paragraph.base_website_url)
    return paragraph

Additional context

Current workaroud.

@mock.patch("snorkel.map.core.get_hashable", lambda p: p.base_website_url)
@preprocessor(memoize=True)
def add_website_reliability(paragraph):
    paragraph.website_reliability = evaluate_reliability(paragraph.base_website_url)
    return paragraph

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions