-
Notifications
You must be signed in to change notification settings - Fork 78
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
A clear and concise description of what the bug is.
A sentence "123 456 789" is parsed and gets three words "123", "456", and "789".
I'd like to match a number like
RegexMatchSpan(rgx=r"\d{9}", sep="")
but sep=""
has no effect.
To Reproduce
Steps to reproduce the behavior:
- Have a sentence "123 456 789"
- Parse it
- Try to match it with
RegexMatchSpan(rgx=r"\d{9}", sep="")
Expected behavior
A clear and concise description of what you expected to happen.
RegexMatchSpan(rgx=r"\d{9}", sep="")
matches a sentence of "123 456 789".
Environment (please complete the following information):
- Fonduer Version: 0.6.2
Additional context
Add any other context about the problem here.
I think the root cause of this issue is the following implementation.
fonduer/src/fonduer/candidates/models/span_mention.py
Lines 140 to 159 in f866248
def get_attrib_span(self, a, sep=" "): | |
"""Get the span of sentence attribute *a*. | |
Intuitively, like calling:: | |
sep.join(span.a) | |
:param a: The attribute to get a span for. | |
:type a: str | |
:param sep: The separator to use for the join. | |
:type sep: str | |
:return: The joined tokens, or text if a="words". | |
:rtype: str | |
""" | |
# NOTE: Special behavior for words currently (due to correspondence | |
# with char_offsets) | |
if a == "words": | |
return self.sentence.text[self.char_start : self.char_end + 1] | |
else: | |
return sep.join(self.get_attrib_tokens(a)) |
where
a
is words
by default.Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working