This repository contains the code, methodology, and scripts required for hydrological analysis of streamflow using machine learning techniques. The approach combines reanalysis climate data (ERA5/ERA5-Land) and streamflow data from the CEMS GloFAS v4.0 global forecasting system. The main goal is to reconstruct and model discharge time series in poorly gauged regions, enabling robust water resource assessment.
This repository is organized into logical directories to support the full workflow of streamflow modeling using reanalysis climate and hydrological data.
⚠️ Important: Thedata/
directory is not included in this repository due to storage limitations.
To run the complete modeling pipeline, you must manually download the required datasets from the official sources listed below and organize them as follows:
data/
— Raw and preprocessed datasetsclimate/
— ERA5-Land reanalysis variables: precipitation (tp
), temperature (t2m
), solar radiation (ssrd
), wind speed (sfcWind
), etc.- 📥 Source: Copernicus Climate Data Store (CDS)
terrain/
— Flow direction and flow accumulation rasters for watershed delineation- 📥 Source: HydroSHEDS
discharge/
— GloFAS v4.0 streamflow time series and shapefiles for basin or point-based extraction- 📥 Source: Copernicus EWDS Portal
🗂️ Ensure the directory structure matches this format so that scripts and notebooks can locate the data correctly.
notebooks/
— Interactive notebooks for exploratory data analysis and model developmentExtract_Basins.ipynb
— Watershed delineation using flow direction and accumulation rastersCreate_Regression_Models.ipynb
— Training and evaluation of discharge prediction models based on climate variables
src/
— Modular Python functionsSWIM.py
— Core logic for climate-discharge modeling, including preprocessing, model training, and prediction
environment.yml
— Conda environment definition for dependency managementREADME.md
— Project overview and usage instructions (this file)
- Variables used: precipitation, air temperature, potential evaporation, etc.
- Resolution: 0.1° (~9 km), daily
- Access: via CDS API
- Variable: daily simulated streamflow (LISFLOOD model)
- Period: 1979 to present
- Access: via EWDS Portal
The modeling workflow follows a structured pipeline to estimate river discharge using machine learning techniques driven by global climate reanalysis data:
-
Climate and Discharge Data Preprocessing
- Monthly aggregation of ERA5-Land variables (e.g., precipitation, temperature, radiation, wind)
- Extraction and temporal alignment of streamflow series from GloFAS v4.0
- Temporal normalization: shifting monthly dates to the first day and removing anomalies
- Spatial operations: clipping to basin geometry or selecting nearest pixel to pour point
-
Model Training and Evaluation
- Construction of feature matrices from gridded climate data
- Optional dimensionality reduction using PCA (configurable variance threshold)
- Training regression models: e.g., Support Vector Regression (SVR), Random Forest, XGBoost
- Evaluation using hydrological metrics:
- NSE (Nash–Sutcliffe Efficiency)
- R² (Coefficient of Determination)
- PBIAS (Percent Bias)
- RMSE (Root Mean Square Error)
- Automated selection and saving of the best-performing model
-
Discharge Prediction from New Climate Data
- Application of trained models on new or future climate datasets
- Output: simulated monthly streamflow series with performance visualization
- Comparison with observed series using scatter plots, time series, and cumulative flow curves
- Model and results exported to disk for reproducibility
Each of these steps is encapsulated in modular Python functions within the SWIM.py
module, ensuring flexibility and reusability.
- Joint Research Center, Copernicus Emergency Management Service (2019): River discharge and related historical data from the Global Flood Awareness System. Early Warning Data Store (EWDS). DOI: 10.24381/cds.a4fdd6b9 (Accessed on 01-JUN-2025)
- Muñoz Sabater, J. (2019): ERA5-Land monthly averaged data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.68d2bb30 (Accessed on 01-JUN-2025)
- Navas, S., & del Jesus, M. . (2025). SIMPCCe: una herramienta para el análisis de aportaciones a embalses ante escenarios de cambio climático. Ingeniería Del Agua, 29(2), 132–148. https://doi.org/10.4995/ia.2025.23217
To reproduce the environment and run the code, follow the steps below.
First, clone this repository to your local machine:
git clone https://github.com/<your-org-or-username>/SWIM_WaterBalanceModule.git
cd SWIM_WaterBalanceModule
conda env create -f environment.yml
conda activate SWIM_WaterBalance
pip install git+https://github.com/navass11/pysheds
This project is developed by the Hydroclimatology Group group at IHCantabria
This project is licensed under the GNU General Public License v3.0.
You are free to use, modify, and distribute this software under the terms of the GPL license.
See the LICENSE
file for full legal terms.
🔗 More about the GPL-3.0 License: https://www.gnu.org/licenses/gpl-3.0.html