-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I'm the guy from here and followed the call and having still issues with an encrypted pdf. I'm trying to extract metadata from this file. Advantage over pypdf3 is that the cover can be extracted without problem from the problematic files with pyPDF2.
The file can be opened from a "normal" pdf reader application and at least some of the metadata can be seen

Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.15.0-46-generic-x86_64-with-glibc2.35
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3
PyCryptodome-3.15.0 is installed alsoCode + PDF
(having the below mentioned pdf downloaded and renamed to encrypt.pdf)
from PyPDF2 import PdfFileReader
with open('encrypt.pdf', 'rb') as f:
pdf_file = PdfFileReader(f)
doc_info = pdf_file.getDocumentInfo()
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
https://cloud.3dissue.net/24308/24333/24567/65779/Position_4.21-211104-DE-web-20211203082446.pdf
I'm not the owner/creator of the pdf so I recommend not to use them for automatic tests
Traceback
This is the complete Traceback I see:
doc_info = pdf_file.getDocumentInfo()
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 339, in getDocumentInfo
return self.metadata
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 327, in metadata
obj = self.trailer[TK.INFO]
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_data_structures.py", line 150, in __getitem__
return dict.__getitem__(self, key).get_object()
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_base.py", line 163, in get_object
obj = self.pdf.get_object(self)
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1121, in get_object
retval = self._get_object_from_stream(indirect_reference) # type: ignore
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1072, in _get_object_from_stream
objnum = NumberObject.read_from_stream(stream_data)
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_base.py", line 296, in read_from_stream
num = read_until_regex(stream, NumberObject.NumberPattern)
File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_utils.py", line 158, in read_until_regex
raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly