-
Notifications
You must be signed in to change notification settings - Fork 151
Description
I have been trialing an unmodified docker install with ~1200 scanned newspaper clippings.
All files uploaded through the web interface in batches and functionality was tested after two small batches.
The last batch of 1000 included a folder which was a saved webpage (a folder renamed in the logs "some _ web page", a folder name 69 characters long).
This hung the conversion and logged the following
Thu, July 24th, 2025, 16:03: Converting file Some(some_webpage.html) (text/html; charset="ISO-8859-1") into a PDF
Thu, July 24th, 2025, 16:03: Running external command: weasyprint --encoding ISO-8859-1 - /tmp/docspell-weasyprint/docspell-weasyprint11682172552415435536/out.pdf
Thu, July 24th, 2025, 16:03: Waiting for command to terminate…
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: ERROR: Failed to load stylesheet at file:///tmp/docspell-weasyprint/docspell-weasyprint11682172552415435536/some_webpage_files/main.4dd8da19.chunk.css: URLError: <urlopen error [Errno 2] No such file or directory: '/tmp/docspell-weasyprint/docspell-weasyprint11682172552415435536/some_webpage_files/main.4dd8da19.chunk.css'>
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: ERROR: Failed to load stylesheet at file:///tmp/docspell-weasyprint/docspell-weasyprint11682172552415435536/some%20_%20webpage_files/51aab41ed2181e2490a43420f093a654.css: URLError: <urlopen error [Errno 2] No such file or directory: '/tmp/docspell-weasyprint/docspell-weasyprint11682172552415435536/some _ webpage_files/51aab41ed2181e2490a43420f093a654.css'>
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Sans Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Sans Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Sans Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Sans Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Text Egyptian Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Text Egyptian Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Text Egyptian Web' cannot be loaded
Thu, July 24th, 2025, 16:03: [weasyprint (err)]: WARNING: Font-face 'Guardian Text Egyptian Web' cannot be loaded
I cancelled the job (pressed the button) and after a few minutes the remaining files completed but the processing queue kept showing 1 job running.
Two items were created some _ web pagedocx and some _ web page.pdf,
I opened the detail view of the html item and displayed the source correctly.
I selected view extracted file and got Pretty-print "Authentication failed."
The detail view and open the file in new tab now show Pretty-print "Authentication failed.:
All other items in the database now have no tile image and the attachments display as Pretty-print "Authentication failed." although they were ok when inspected during processing.
I have tried stopping and restarting docker and repeating after deleting the docx and html items.
It seems as though either the html conversion errors or cancelling the item during the process has broken something. Any advice on recovery appreciated and ways to avoid this issue appreciated.
UPDATE: this only impacting pages within an EDGE browser.