This repository presents an end-to-end pipeline for predicting 30-day hospital readmissions among diabetic patients using modern AutoML techniques. The study leverages AutoGluon, a state-of-the-art ensemble-based AutoML framework, and evaluates it against traditional machine learning, deep learning, and transformer-based tabular models.
This study contributes by validating the AutoGluon ensemble model for 30-day readmission prediction in diabetic patients, benchmarking it against diverse models, and highlighting risk factors to support early intervention.
Hospital readmissions within 30 days are a major quality metric and financial burden, particularly for diabetic patients. This project builds and evaluates predictive models to assess readmission risk using structured clinical data from electronic health records (EHRs). The core contributions of this work include:
- Development of an AutoML pipeline based on AutoGluon
- Comparison of ensemble methods with traditional ML, DL, and foundation models
- Exploration of preprocessing techniques, feature importance, and subgroup performance
The results consistently show that ensemble learning via AutoGluon outperforms other models, with LightGBM and CatBoost being strong individual contenders. Deep neural networks and transformer-based models (e.g., TabPFNMix) are competitive but underperform in this static tabular setting.
📄 For more details, please access the full paper and the presentation slides.
This project uses the publicly available dataset from the UCI Machine Learning Repository:
Diabetes 130-US hospitals for years 1999–2008
.
├── src/ # Source code modules
│ ├── Config.py # Global configuration and constants
│ ├── Prep.py # Data cleaning, preprocessing, clustering
│ ├── Train_model.py # Training logic using AutoGluon
│ ├── Utils.py # Utility functions
│ └── Vis.py # Visualizations (e.g., SHAP, performance plots)
│
├── notebooks/ # Jupyter notebooks
│ └── train.ipynb # Training and evaluation
│
├── paper/ # Research paper and supplementary material
│ └── AutoGluon_Readmission_Predictions.pdf
│
├── ag.yaml # Conda environment file
├── README.md # This file
We recommend using a Conda environment for reproducibility:
conda env create -f ag.yaml
conda activate ag
You can run the full workflow interactively inside the Jupyter notebook:
-
Launch the notebook:
jupyter notebook train.ipynb
-
Run the full pipeline:
# Inside train.ipynb from src.Train_model import TrainAutoGluon trainer = TrainAutoGluon(...) trainer.run_pipeline()
-
Visualize results:
from src.Vis import plot_feature_importance, shap_summary_plot, ...
This approach allows for step-by-step inspection, debugging, and comparison.
If you use this work, please cite:
@article{yuan2025readmission,
title={Predicting 30-Day Readmissions in Diabetic Patients Using Ensemble Learning with AutoGluon},
author={Yuan, Baijiang},
year={2025},
note={University of Toronto, Institute of Medical Science and University Health Network}
}