This repository contains the results of my bachelor thesis conducted for AbiGlobalHealth, focusing on extracting information from images sent by users of their telemedicine service.
IMPORTANT: All images used in this thesis are sourced from Google and public datasets to ensure GDPR compliance. No patient images are included.
The pipeline processes images and automatically routes them to specialized engines based on their type. It combines classification, captioning, segmentation, and OCR to handle a wide variety of image types, including clinical photos, medical documents, and screenshots.
A fine-tuned ResNet50 classifier categorizes images into seven classes: body-structure, face-features, medical documents, fluids, medicine packaging, paper documents, and screenshots. The dataset was labeled using a custom-built annotation platform.
Validation curves for different classifiers investigated in this step are shown below:
Decision thresholds for the classifier were determined based on model confidence:
Results for the classification step:
Once classified, images are processed by specialized backends:
Captioning models (e.g., ViT-GPT2, BLIP) generate text descriptions of visible signs.
A secondary classifier identifies subtypes of medical documents (e.g., radiographies, MRIs).
Text segmentation models (e.g., CRAFT, EAST) isolate regions of interest, which are processed by OCR engines (e.g., Tesseract, TrOCR) to extract text.
Direct OCR extracts text from printed or handwritten documents.
OCR extracts on-screen text from digital screenshots.
This pipeline integrates classification, captioning, segmentation, and OCR to process any medical image type received by a usual telemedicine platform. It can handle clinical photos, digitize medical documents, and extract text from screenshots efficiently.
For more details, refer to the thesis PDF: Information Extraction from Telemedicine Consultation Images.








