Skip to content

ModuleNotFoundError: No module named 'pdfminer.layout' #32360

@xxz7909

Description

@xxz7909

Checked other resources

  • This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code

PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'

This is my code:

pdf-read.py

from langchain_community.document_loaders import UnstructuredPDFLoader

加载 PDF 文件

loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()

I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>

I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)

Error Message and Stack Trace (if applicable)

PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'

This is my code:

pdf-read.py

from langchain_community.document_loaders import UnstructuredPDFLoader

加载 PDF 文件

loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()

I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>

I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)

Description

PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'

This is my code:

pdf-read.py

from langchain_community.document_loaders import UnstructuredPDFLoader

加载 PDF 文件

loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()

I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>

I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)

System Info

PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'

This is my code:

pdf-read.py

from langchain_community.document_loaders import UnstructuredPDFLoader

加载 PDF 文件

loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()

I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>

I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions