Skip to content

Conversation

bariscant
Copy link

I am having an issue when I try to upload a csv file with ascii encoding. API produces below error.

Error

rag_api           | 2025-07-18 09:10:39,302 - root - INFO - Request POST http://rag_api:8000/embed - 200
rag_api           | 2025-07-18 09:10:41,828 - root - INFO - Request DELETE http://rag_api:8000/documents - 200
rag_api           | 2025-07-18 09:10:45,241 - root - ERROR - Error during file processing: Error loading ./uploads/68776f659b93d757edbfce49/Software_Overview__1_.csv
rag_api           | Traceback: Traceback (most recent call last):
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_community/document_loaders/csv_loader.py", line 135, in lazy_load
rag_api           |     yield from self.__read_file(csvfile)
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_community/document_loaders/csv_loader.py", line 155, in __read_file
rag_api           |     for i, row in enumerate(csv_reader):
rag_api           |   File "/usr/local/lib/python3.10/csv.py", line 111, in __next__
rag_api           |     row = next(self.reader)
rag_api           |   File "/usr/local/lib/python3.10/codecs.py", line 322, in decode
rag_api           |     (result, consumed) = self._buffer_decode(data, self.errors, final)
rag_api           | UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 2682: invalid start byte
rag_api           | 
rag_api           | The above exception was the direct cause of the following exception:
LibreChat         | 2025-07-18 09:10:45 error: Error uploading vectors The server responded with status 400: Request failed with status code 400
rag_api           | 
LibreChat         | 2025-07-18 09:10:45 error: [/files] Error processing file: Request failed with status code 400
rag_api           | Traceback (most recent call last):
rag_api           |   File "/app/app/routes/document_routes.py", line 414, in embed_file
rag_api           |     data = await run_in_executor(request.app.state.thread_pool, loader.load)
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 593, in run_in_executor
rag_api           |     return await asyncio.get_running_loop().run_in_executor(executor_or_config, wrapper)
rag_api           |   File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
rag_api           |     result = self.fn(*self.args, **self.kwargs)
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/config.py", line 579, in wrapper
rag_api           |     return func(*args, **kwargs)
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_core/document_loaders/base.py", line 31, in load
rag_api           |     return list(self.lazy_load())
rag_api           |   File "/usr/local/lib/python3.10/site-packages/langchain_community/document_loaders/csv_loader.py", line 149, in lazy_load
rag_api           |     raise RuntimeError(f"Error loading {self.file_path}") from e
rag_api           | RuntimeError: Error loading ./uploads/68776f659b93d757edbfce49/Software_Overview__1_.csv
rag_api           | 
rag_api           | 2025-07-18 09:10:45,242 - root - INFO - Request POST http://rag_api:8000/embed - 400
  • Enhanced file encoding detection for CSV files using BOM markers and chardet, improving support for non-UTF-8 encodings.
  • Added error='replace' parameter to the open method. So that files can be read without crashing, even if there are some invalid or corrupt bytes for the given encoding.

Testing

  • Image builds locally
  • CSV import is working

@bariscant bariscant changed the title feat: improve csv loading and character encoding detection fix: improve csv loading and character encoding detection Jul 18, 2025
@danny-avila danny-avila changed the title fix: improve csv loading and character encoding detection 🏓 fix: improve csv loading and character encoding detection Aug 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants