A robust, production-ready RAG (Retrieval-Augmented Generation) pipeline that uses Vision Language Models to process and query multimodal documents through OpenWebUI.
- Vision-Language Document Processing: Uses ColPali models to understand both text and visual elements in documents
- PDF to Image Conversion: Automatically converts PDF documents to high-quality images for processing
- Vector Database Storage: Efficient storage and retrieval using Qdrant with optimized configurations
- OpenWebUI Integration: Seamless integration as a pipeline with OpenWebUI
- Background Initialization: Non-blocking startup process for better user experience
- State Persistence: Intelligent state management to avoid redundant initialization
- Multi-threaded Processing: Optimized for performance with concurrent processing
graph TB
A[Knowledge PDF Documents] --> B[PDF to Image Conversion]
B --> C[ColPali Vision Model]
C --> D[Vector Embeddings]
D --> E[Qdrant Vector DB]
F[User Query] --> G[Query Processing]
G --> H[Vector Search]
H --> E
E --> I[Retrieved Results]
I --> J[Response Generation]
- Python 3.11+
- CUDA-capable GPU (recommended)
- Poppler: Required for PDF processing
- Windows: Download from Poppler for Windows
- Linux:
sudo apt-get install poppler-utils
- macOS:
brew install poppler
pdf2image>=3.1.0
qdrant-client>=1.7.0
colpali-engine>=0.2.0
Pillow>=10.0.0
torch>=2.0.0
transformers>=4.35.0
requests>=2.31.0
- Download Poppler for Windows
- Extract to a folder (e.g.,
C:\poppler-23.11.0
) - Add the
bin
directory to your system PATH:C:\poppler-23.11.0\Library\bin
- Restart your terminal/IDE
sudo apt-get update
sudo apt-get install poppler-utils
brew install poppler
pip install pdf2image qdrant-client colpali-engine Pillow torch transformers requests
- Copy the pipeline file to your OpenWebUI pipelines directory
- Update the configuration variables in the pipeline:
BASE_URL = "http://your-openwebui-host:port/api/v1" API_KEY = "your-api-key" # Optional
- Restart OpenWebUI
Update these variables in the pipeline file:
# OpenWebUI API Configuration
BASE_URL = "http://10.1.42.88:8080/api/v1"
API_KEY = "sk-5d9ab3bd43c846f2a6da49e68dacbbf5" # Optional
# Model Configuration
model_name = "vidore/colqwen2-v1.0" # or "vidore/colqwen2.5-v0.1"
# Processing Configuration
downloads_dir = "downloads" # Directory for downloaded files
dpi = 200 # Image conversion quality
The pipeline automatically configures Qdrant with optimized settings:
- Storage: On-disk payload storage for large datasets
- Quantization: INT8 scalar quantization for memory efficiency
- Multi-vector: MAX_SIM comparator for optimal retrieval
- Distance: Cosine similarity for semantic matching
Documents are automatically ingested from OpenWebUI's knowledge base:
- Upload PDF documents to OpenWebUI knowledge collections
- The pipeline will automatically process them during initialization
- Documents are converted to images and embedded using ColPali models
Simply ask questions through OpenWebUI chat interface:
"What does the financial report say about Q3 revenue?"
"Show me the architectural diagram from the technical documentation"
"Find information about the company's sustainability initiatives"
If you need to force re-initialization:
pipeline = Pipeline()
pipeline.reset_initialization()
results = pipeline.query(
question="Your question here",
top_k=10 # Number of results to return
)
Error: Poppler's 'pdftotext.exe' was not found in the PATH
Solution: Ensure Poppler is installed and added to your system PATH.
Error: CUDA out of memory
Solutions:
- Reduce batch size in processing
- Use CPU processing: change
device_map="cpu"
- Process fewer documents at once
Error: Request error: Connection refused
Solution: Verify OpenWebUI is running and the BASE_URL is correct.
Error: Cannot load colpali models
Solutions:
- Check internet connection for model download
- Verify CUDA installation if using GPU
- Try CPU mode if GPU issues persist
Enable detailed logging by modifying the pipeline:
import logging
logging.basicConfig(level=logging.DEBUG)
The pipeline manages several states:
__init__
: Basic setup and dependency checkson_startup
: Lightweight initialization, schedules background work- Background Init: Heavy model loading and document processing
- Ready: Fully initialized and ready for queries
State is persisted in pipeline_state.json
to avoid redundant initialization.
- API Keys: Store API keys securely, consider environment variables
- File Access: Pipeline only accesses files through OpenWebUI API
- Network: Ensure secure connections to OpenWebUI instance
- Model Downloads: Models are downloaded from Hugging Face Hub
- Model Precision: Uses bfloat16 for memory efficiency
- Flash Attention: Automatic detection and usage when available
- Quantization: INT8 quantization reduces memory footprint
- Multi-threading: PDF conversion uses half of available CPU cores
- Batch Processing: Processes multiple images efficiently
- Caching: Reuses initialized models and database connections
- On-disk Payload: Large metadata stored on disk
- Vector Compression: Quantized vectors for reduced storage
- Incremental Updates: Only processes new or changed documents
- Multiple Instances: Can run multiple pipeline instances
- Load Balancing: Distribute queries across instances
- Shared Storage: Use external Qdrant instance for shared vector storage
- GPU Scaling: Supports multi-GPU setups
- Memory Scaling: Configurable batch sizes and caching
- Storage Scaling: Qdrant supports distributed deployments
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
git clone https://github.com/yourusername/colpali-rag-pipeline.git
cd colpali-rag-pipeline
pip install -r requirements.txt
This project is licensed under the MIT License - see the LICENSE file for details.
- ColPali Team: For the excellent vision-language models
- Qdrant: For the high-performance vector database
- OpenWebUI: For the intuitive chat interface
- Hugging Face: For model hosting and transformers library
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- ✨ Added background initialization for faster startup
- 🔧 Improved error handling and state management
- 📈 Performance optimizations for large document sets
- 🐛 Fixed OpenWebUI integration issues
- 🎯 Multi-vector support for better retrieval
- 🗜️ Vector quantization for memory efficiency
- 🔄 Automatic document synchronization
- 🖼️ PDF to image conversion pipeline
- 💾 Persistent state management
- 🚀 OpenWebUI integration
- 🎉 Initial release with basic RAG functionality
Made with ❤️ for the AI community