Skip to content

loks666/ragflow-fix-ocr-gpu-memory

Repository files navigation

🛠️ ragflow-fix-ocr-gpu-memory

This repository documents and provides a tested fix for an OCR-related GPU memory issue in RAGFlow. The core objective of this project is to demonstrate the environment, configuration, and code changes that successfully resolve the error, ensuring stable PDF parsing and indexing even for large medical textbooks.

✅ Background

While running RAGFlow with GPU acceleration, OCR parsing of large files (e.g., 177 MB scanned medical textbooks) could cause abnormal termination or memory errors. After troubleshooting, adjustments were made to system configuration, Docker runtime, and the RAGFlow OCR code (ocr.py). With these changes, the system can now reliably parse large PDFs without exceptions.

🔬 Testing Environment

  • CPU: AMD Ryzen 9 5900X

  • Memory: DDR4, 64 GB

  • GPU: NVIDIA RTX 3090, 24 GB

  • File tested: 177 MB medical textbook (scanned PDF)

  • Image used: v0.20.5-slim

  • Launch method: docker-compose-gpu.yml

  • Model backend: Ollama

  • Nginx adjustment:

    client_max_body_size 256M;

    This was critical to allow larger PDF uploads.

After these adjustments, the PDF was re-sliced, processed, and parsed without any anomalies or error messages.


📂 Repository Structure

ragflow-fix-ocr-gpu-memory/
│
├── ragflow-logs/                 # Logs captured during test runs
│   ├── ragflow_server.log        # Main server log
│   └── task_executor_*.log       # Task execution logs
│
├── system_image/                 # Screenshots of system setup & configurations
│   ├── explorer.png              # Windows file explorer showing file layout
│   ├── ollama-settings.png       # Ollama model configuration
│   ├── rag-flow.png              # RAGFlow runtime interface
│   └── slice_func.png            # Demonstration of PDF slicing function
│
├── .env_example                  # Example .env file for environment variables
├── .gitignore                    # Git ignore rules
├── docker-compose-gpu.yml        # Docker Compose file with GPU support
├── ocr.py                        # Modified OCR script to fix GPU memory usage
├── README.md                     # This documentation
├── service_conf.yaml.template    # Template for service configuration
└── 《内科学》(第10版).pdf      # Sample large medical textbook (177 MB, for test)

📸 Images & References

  • Directory structure
  • Explorer
  • Ollama model configuration
  • Ollama settings
  • RAGFlow runtime environment
  • RAGFlow UI
  • Demonstration of PDF slicing
  • PDF slicing

🔗 Additional Resources

All supporting files, logs, and screenshots are included in this repository. You can review them to reproduce the results or verify the fix.

👉 Repository link: https://github.com/loks666/ragflow-fix-ocr-gpu-memory

If you encounter further issues, feel free to open an issue or start a discussion. Collaboration is welcome to refine and extend the fix.

About

bug explain

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages