Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate bbox_from_span and bbox_from_sentence #429

HiromuHota · 2020-05-27T19:02:08Z

This PR has two benefits:

Type hints become short, hence easier to understand: Tuple[int, int, int, int, int] to Bbox.
No need to import bbox_from_sentence and bbox_from_span. Just use sentence.get_bbox() and span.get_bbox(). This is actually the benefit from using OOP.

P.S. There are many more spots where OOP (object-oriented programming) is more suited.

…bbox_from_span and bbox_from_sentence

lukehsiao

LGTM, but looks like tests fail now.

This reverts commit 69f3dd4.

HiromuHota · 2020-05-27T20:31:06Z

I think I fixed the issue.

codecov-commenter · 2020-05-27T20:52:30Z

Codecov Report

Merging #429 into master will decrease coverage by 0.15%.
The diff coverage is 75.60%.

@@            Coverage Diff             @@
##           master     #429      +/-   ##
==========================================
- Coverage   82.59%   82.44%   -0.16%     
==========================================
  Files          86       86              
  Lines        4366     4385      +19     
  Branches      810      812       +2     
==========================================
+ Hits         3606     3615       +9     
- Misses        572      582      +10     
  Partials      188      188

Flag	Coverage Δ
#unittests	`82.44% <75.60%> (-0.16%)`	⬇️

Impacted Files	Coverage Δ
src/fonduer/candidates/models/span_mention.py	`74.76% <60.00%> (-0.73%)`	⬇️
src/fonduer/parser/models/sentence.py	`93.38% <60.00%> (-1.44%)`	⬇️
src/fonduer/utils/utils_visual.py	`58.33% <71.42%> (-6.67%)`	⬇️
src/fonduer/utils/data_model_utils/visual.py	`88.23% <77.77%> (ø)`
src/fonduer/parser/visual_linker.py	`83.57% <100.00%> (+0.07%)`	⬆️
src/fonduer/utils/visualizer.py	`79.16% <100.00%> (ø)`

senwu · 2020-05-28T09:08:00Z

Thanks for this PR!

What do you think about having a fonduer.typing module to store all Fonduer specific typings? Bbox is a good example here.

HiromuHota · 2020-05-28T15:54:03Z

Having a fonduer.typing module is a good idea.
IMO, this will have type aliases to make codes more readable.
For example, the following type hints are very lengthy and hard to read.

fonduer/src/fonduer/parser/visual_linker.py

Lines 31 to 33 in bd79688

    
           self.pdf_word_list: Optional[List[Tuple[Tuple[int, int], str]]] = None 
        
           self.html_word_list: Optional[List[Tuple[Tuple[str, int], str]]] = None 
        
           self.links: Optional[OrderedDict[Tuple[str, int], Tuple[int, int]]] = None

By defining type aliases like below:

Alias1 = List[Tuple[Tuple[int, int], str]]
Alias2 = OrderedDict[Tuple[str, int], Tuple[int, int]]

This could become

self.pdf_word_list: Optional[Alias1] = None
self.html_word_list: Optional[Alias1] = None 
self.links: Optional[Alias2] = None

However, Bbox is not an alias, hence IMO it is not suited to be placed fonduer.typing.
Moreover, I'll be adding methods to Bbox like Bbox.horz_aligned (superseding bbox_horz_aligned) and Bbox.vert_aligned (superseding bbox_vert_aligned), which makes Bbox less suitable in fonduer.typing.

HiromuHota · 2020-05-28T21:26:47Z

A good example would be: alias for Throtter

Currently,

fonduer/src/fonduer/candidates/candidates.py

Line 65 in bd79688

throttlers: Optional[List[Callable[[Tuple[Mention, ...]], bool]]] = None,

with

Throttler=Callable[[Tuple[Mention, ...]], bool]

to

throttlers: Optional[List[Throttler]] = None,

Hiromu Hota added 6 commits May 27, 2020 11:12

Use NamedTuple, a typed version of namedtuple, for Bbox

be6f30f

Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate …

63431e4

…bbox_from_span and bbox_from_sentence

Use Bbox instead of Tuple[int, int, int, int, int]

a641ac3

list index should be int

69f3dd4

More use of get_bbox()

efcf037

Update CHANGELOG

0f7d619

HiromuHota marked this pull request as ready for review May 27, 2020 19:08

lukehsiao approved these changes May 27, 2020

View reviewed changes

lukehsiao added the clean-up Cleaning up the code or refactoring label May 27, 2020

lukehsiao added this to the v0.8.3 milestone May 27, 2020

Hiromu Hota added 2 commits May 27, 2020 13:28

Revert "list index should be int"

3e19077

This reverts commit 69f3dd4.

Correct type hints

cc3cca8

lukehsiao merged commit 3ccf672 into HazyResearch:master May 27, 2020

HiromuHota deleted the feature/bbox_as_typed_namedtuple branch May 28, 2020 15:54

This was referenced Jun 12, 2020

Bbox field order is inconsistent between how it is defined and how it is used #443

Closed

visualizer.get_box is a duplicate of utils_visual.bbox_from_span #445

Closed

Sort by block top, block left, top, then left #449

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate bbox_from_span and bbox_from_sentence #429

Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate bbox_from_span and bbox_from_sentence #429

Uh oh!

HiromuHota commented May 27, 2020

Uh oh!

lukehsiao left a comment •

edited

Loading

Uh oh!

HiromuHota commented May 27, 2020

Uh oh!

codecov-commenter commented May 27, 2020 •

edited

Loading

Uh oh!

senwu commented May 28, 2020

Uh oh!

HiromuHota commented May 28, 2020

Uh oh!

HiromuHota commented May 28, 2020

Uh oh!

Uh oh!

Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate bbox_from_span and bbox_from_sentence #429

Add get_bbox() to Sentence and (Temporary)SpanMention, and deprecate bbox_from_span and bbox_from_sentence #429

Uh oh!

Conversation

HiromuHota commented May 27, 2020

Uh oh!

lukehsiao left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HiromuHota commented May 27, 2020

Uh oh!

codecov-commenter commented May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

senwu commented May 28, 2020

Uh oh!

HiromuHota commented May 28, 2020

Uh oh!

HiromuHota commented May 28, 2020

Uh oh!

Uh oh!

lukehsiao left a comment •

edited

Loading

codecov-commenter commented May 27, 2020 •

edited

Loading