Skip to content

Conversation

YasushiMiyata
Copy link
Contributor

@YasushiMiyata YasushiMiyata commented Aug 1, 2020

Description of the problems or issues

Is your pull request related to a problem? Please describe.
A clear and concise description of what the problem is.

A sentence "123 456 789" is parsed and gets three words "123", "456", and "789".
I'd like to match a number like
RegexMatchSpan(rgx=r"\d{9}", sep=" ")

but sep=" " has no effect

Does your pull request fix any issue.
Fix #270

Description of the proposed changes

Enable RegexMatchSpan with sep="(separator)" option.
It concatenates mention spans to one word and does RgexMatch without consideration of the separator.

Test plan

Add Test Code to 'fonduer/tests/candidates/test_matchers.py'.
A sentence "This is apple" is parsed and gets 2 2-grams "This is" and "is apple".
We can get "is apple" with following rgx and sep="(space)" option:
RegexMatchSpan(rgx=r"isapple", sep=" ")

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have updated the CHANGELOG.rst accordingly.

…tion

Fix #270
Enable RegexMatchSpan with sep="(separator)" option.
It concatenates mention spans to one word and does RegexMatch without consideration of the separator.
@YasushiMiyata
Copy link
Contributor Author

Some codes may be updated while creating #492. I'm now re-checking.

@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2020

Codecov Report

Merging #492 into master will not change coverage.
The diff coverage is 71.42%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #492   +/-   ##
=======================================
  Coverage   85.85%   85.85%           
=======================================
  Files          88       88           
  Lines        4568     4568           
  Branches      851      853    +2     
=======================================
  Hits         3922     3922           
  Misses        464      464           
  Partials      182      182           
Flag Coverage Δ
#unittests 85.85% <71.42%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...fonduer/candidates/models/implicit_span_mention.py 81.96% <66.66%> (ø)
src/fonduer/candidates/models/span_mention.py 82.24% <66.66%> (ø)
src/fonduer/candidates/matchers.py 97.31% <100.00%> (ø)

@YasushiMiyata YasushiMiyata marked this pull request as ready for review August 3, 2020 21:24
@YasushiMiyata
Copy link
Contributor Author

Something failure in installation of ubuntu. There would be nothing more I can.

@senwu
Copy link
Collaborator

senwu commented Aug 5, 2020

Thanks for making this clear!

@senwu senwu merged commit 01e0d93 into HazyResearch:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RegexMatchSpan with sep="" concatenates words with sep="(space)"

3 participants