Skip to content

Error "Stream has ended unexpectedly" on getDocumentInfo with certain pdf file(s) #1288

@OzzieIsaacs

Description

@OzzieIsaacs

I'm the guy from here and followed the call and having still issues with an encrypted pdf. I'm trying to extract metadata from this file. Advantage over pypdf3 is that the cover can be extracted without problem from the problematic files with pyPDF2.
The file can be opened from a "normal" pdf reader application and at least some of the metadata can be seen
image

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.15.0-46-generic-x86_64-with-glibc2.35

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3

PyCryptodome-3.15.0 is installed also

Code + PDF

(having the below mentioned pdf downloaded and renamed to encrypt.pdf)

from PyPDF2 import PdfFileReader
with open('encrypt.pdf', 'rb') as f:
    pdf_file = PdfFileReader(f)
    doc_info = pdf_file.getDocumentInfo()

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

https://cloud.3dissue.net/24308/24333/24567/65779/Position_4.21-211104-DE-web-20211203082446.pdf

I'm not the owner/creator of the pdf so I recommend not to use them for automatic tests

Traceback

This is the complete Traceback I see:

    doc_info = pdf_file.getDocumentInfo()
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 339, in getDocumentInfo
    return self.metadata
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 327, in metadata
    obj = self.trailer[TK.INFO]
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_data_structures.py", line 150, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_base.py", line 163, in get_object
    obj = self.pdf.get_object(self)
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1121, in get_object
    retval = self._get_object_from_stream(indirect_reference)  # type: ignore
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1072, in _get_object_from_stream
    objnum = NumberObject.read_from_stream(stream_data)
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/generic/_base.py", line 296, in read_from_stream
    num = read_until_regex(stream, NumberObject.NumberPattern)
  File "/home/ozzie/Development/calibre-web/venv/lib/python3.10/site-packages/PyPDF2/_utils.py", line 158, in read_until_regex
    raise PdfStreamError(STREAM_TRUNCATED_PREMATURELY)
PyPDF2.errors.PdfStreamError: Stream has ended unexpectedly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions