-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
Checked other resources
- This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'
This is my code:
pdf-read.py
from langchain_community.document_loaders import UnstructuredPDFLoader
加载 PDF 文件
loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()
I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>
I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)
Error Message and Stack Trace (if applicable)
PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'
This is my code:
pdf-read.py
from langchain_community.document_loaders import UnstructuredPDFLoader
加载 PDF 文件
loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()
I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>
I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)
Description
PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'
This is my code:
pdf-read.py
from langchain_community.document_loaders import UnstructuredPDFLoader
加载 PDF 文件
loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()
I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>
I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)
System Info
PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 14, in
from pdfminer.layout import LTContainer, LTImage, LTItem, LTTextBox
ModuleNotFoundError: No module named 'pdfminer.layout'
This is my code:
pdf-read.py
from langchain_community.document_loaders import UnstructuredPDFLoader
加载 PDF 文件
loader = UnstructuredPDFLoader("常见问题及处理方法.pdf", mode="elements") # mode 可以是 "elements" 或 "single"
docs = loader.load()
I have installed pdfminer.six.
PS C:\Users\xzw65\Desktop\RAG-test> pip show pdfminer.six
Name: pdfminer.six
Version: 20221105
Summary: PDF parser and analyzer
Home-page: https://github.com/pdfminer/pdfminer.six
Author: Yusuke Shinyama + Philippe Guglielmetti
Author-email: [email protected]
License: MIT/X
Location: C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages
Requires: charset-normalizer, cryptography
Required-by: unstructured-inference
PS C:\Users\xzw65\Desktop\RAG-test>
I try pdfminer
It report PS C:\Users\xzw65\Desktop\RAG-test> python pdf-read.py
Traceback (most recent call last):
File "C:\Users\xzw65\Desktop\RAG-test\pdf-read.py", line 6, in
docs = loader.load()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_core\document_loaders\base.py", line 32, in load
return list(self.lazy_load())
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\unstructured.py", line 107, in lazy_load
elements = self._get_elements()
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\langchain_community\document_loaders\pdf.py", line 92, in _get_elements
from unstructured.partition.pdf import partition_pdf
File "C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\unstructured\partition\pdf.py", line 15, in
from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (C:\Users\xzw65\AppData\Roaming\Python\Python313\site-packages\pdfminer\utils.py)