-
Notifications
You must be signed in to change notification settings - Fork 94
Extract LTChar even if they are not children of LTTextLine #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Parsing CentralSemiconductorCorp_2N4013.pdf at 936b1c9 gives me <?xml version="1.0" ?>
<html>
<head>
<meta content="Converted from PDF by pdftotree 0.5.0+dev" name="ocr-system"/>
<meta content="ocr_page ocr_table ocrx_block ocrx_word" name="ocr-capabilities"/>
<meta content="10" name="ocr-number-of-pages"/>
</head>
<body>
<div class="ocr_page" id="page_1" title="bbox 0 0 612 792; ppageno 0"/>
<div class="ocr_page" id="page_2" title="bbox 0 0 612 792; ppageno 1"/>
<div class="ocr_page" id="page_3" title="bbox 0 0 612 792; ppageno 2"/>
<div class="ocr_page" id="page_4" title="bbox 0 0 612 792; ppageno 3"/>
<div class="ocr_page" id="page_5" title="bbox 0 0 612 792; ppageno 4"/>
<div class="ocr_page" id="page_6" title="bbox 0 0 612 792; ppageno 5"/>
<div class="ocr_page" id="page_7" title="bbox 0 0 612 792; ppageno 6"/>
<div class="ocr_page" id="page_8" title="bbox 0 0 612 792; ppageno 7"/>
<div class="ocr_page" id="page_9" title="bbox 0 0 612 792; ppageno 8"/>
<div class="ocr_page" id="page_10" title="bbox 0 0 612 792; ppageno 9"/>
</body>
</html> The output contains no text. |
Codecov Report
@@ Coverage Diff @@
## master #79 +/- ##
=========================================
Coverage ? 65.62%
=========================================
Files ? 21
Lines ? 2508
Branches ? 0
=========================================
Hits ? 1646
Misses ? 862
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Description of the problems or issues
Is your pull request related to a problem? Please describe.
See #72
Does your pull request fix any issue.
Fix #72
Description of the proposed changes
Just correctly retrieve page dimensions from layout
Test plan
Test against the pdf causing #72.
Checklist