-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Description
Description of the bug
NumberMatcher
matches spans if their ner_tag is either NUMBER
or QUANTITY
.
fonduer/src/fonduer/candidates/matchers.py
Lines 453 to 465 in 7f5c663
class NumberMatcher(RegexMatchEach): | |
""" | |
Match Spans that are numbers, as identified by spaCy. | |
A convenience class for setting up a RegexMatchEach to match spans | |
for which each token was tagged as a number (NUMBER or QUANTITY). | |
""" | |
def __init__(self, *children, **kwargs): # type: ignore | |
"""Initialize number matcher.""" | |
kwargs["attrib"] = "ner_tags" | |
kwargs["rgx"] = "NUMBER|QUANTITY" | |
super().__init__(*children, **kwargs) |
However, NUMBER
is not supported entity type by spaCy.
https://spacy.io/api/annotation
To Reproduce
N/A
Expected behavior
NumberMatcher
should match if their ner_tag is either CARDINAL
or QUANTITY
.
This is an example result of spaCy as of v2.2,
>>> import spacy
>>> nlp = spacy.load("en")
>>> doc = nlp("He sold 100 million of iPhone.")
>>> for token in doc:
... print(token.text, token.ent_type_)
...
He
sold
100 CARDINAL
million CARDINAL
of
iPhone ORG
.
Error Logs/Screenshots
N/A
Environment (please complete the following information)
- Fonduer Version: 0.8.2
Additional context
The code was ported from Snorkel,
where
Matches Spans that are numbers, as identified by CoreNLP.
CoreNLP uses NUMBER
(https://stanfordnlp.github.io/CoreNLP/ner.html).
Metadata
Metadata
Assignees
Labels
No labels