Skip to content

NUMBER is not supported entity type by spaCy #473

@HiromuHota

Description

@HiromuHota

Description of the bug

NumberMatcher matches spans if their ner_tag is either NUMBER or QUANTITY.

class NumberMatcher(RegexMatchEach):
"""
Match Spans that are numbers, as identified by spaCy.
A convenience class for setting up a RegexMatchEach to match spans
for which each token was tagged as a number (NUMBER or QUANTITY).
"""
def __init__(self, *children, **kwargs): # type: ignore
"""Initialize number matcher."""
kwargs["attrib"] = "ner_tags"
kwargs["rgx"] = "NUMBER|QUANTITY"
super().__init__(*children, **kwargs)

However, NUMBER is not supported entity type by spaCy.
https://spacy.io/api/annotation

To Reproduce

N/A

Expected behavior

NumberMatcher should match if their ner_tag is either CARDINAL or QUANTITY.

This is an example result of spaCy as of v2.2,

>>> import spacy
>>> nlp = spacy.load("en")
>>> doc = nlp("He sold 100 million of iPhone.")
>>> for token in doc:
...     print(token.text, token.ent_type_)
... 
He 
sold 
100 CARDINAL
million CARDINAL
of 
iPhone ORG
. 

Error Logs/Screenshots

N/A

Environment (please complete the following information)

  • Fonduer Version: 0.8.2

Additional context

The code was ported from Snorkel,

https://github.com/snorkel-team/snorkel-extraction/blob/d7faea18e4cbe0d26f3d0aef67839c6669556606/snorkel/matchers.py#L306-L316

where

Matches Spans that are numbers, as identified by CoreNLP.

CoreNLP uses NUMBER (https://stanfordnlp.github.io/CoreNLP/ner.html).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions