Skip to content

After rotation, text is not extracted #1

@eikek

Description

@eikek

Original: eikek/docspell#554 (comment)

I did few more tests and here are the results:

  1. First I uploaded original document (jpg file with incorrect rotation) - OCR couldn't recognize the text properly
  2. I used the rotate addon - the converted PDF got rotated, but the extracted text didn't change
  3. Also, I made a copy of the jpg file, rotated it with Windows Photos app (then, to ensure, I checked with paint - it was rotated properly) and uploaded. The result was having the "original" jpg file rotated properly, but the converted PDF is rotated incorrectly (as it was originally in point 1).
  4. However, when I took the properly-rotated jpg file from point 3, opened in paint, added a single dot anywhere and uploaded - then both the original file and converted PDF were rotated properly (rotation wasn't changed as it happened in point 3) and OCR properly recognized the text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions