Skip to content

Scrambles the contents of some outline/hierarchical PDF documents #77

@steveisakson

Description

@steveisakson

Generate a PDF of this page: https://www.ecfr.gov/current/title-14/chapter-I/subchapter-G/part-139

Convert to MD with pdf-to-markdown.

Compare the PDF with MD. Headings are several lines before the paragraph text that follows in the PDF. Start at the end to find more pronounced differences.

I haven't examined the PDF contents, so this might be related more to the PDFs or how the doc-to-pdf is configured on eCFR.gov. OTOH, they are automatically generated by a (presumably) commercial package. And eCFR has millions of users.

PS - It's not all bad. Your PDF parsing knocks the socks off a lot of other online tools. And the translation to MD is great — thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions