This repository contains the code and resources accompanying the research paper:
Learning to solve the Skill Vehicle Routing Problem with Deep Reinforcement Learning
Nayeli Gast Zepeda, André Hottung, Kevin Tierney
The 19th Learning and Intelligent Optimization Conference (LION 2025)
OpenReview Link
If you use this repository in your research, including the feasible instance generator described in Section 4.2, please cite our paper:
@inproceedings{
zepeda2025learning,
title={Learning to solve the Skill Vehicle Routing Problem with Deep Reinforcement Learning},
author={Nayeli Gast Zepeda and Andr{\'e} Hottung and Kevin Tierney},
booktitle={THE 19TH LEARNING AND INTELLIGENT OPTIMIZATION CONFERENCE},
year={2025},
url={https://openreview.net/forum?id=Xf7fGzezHB}
}The model checkpoints for which experimental results are reported in the paper are available in a separate repository: lion2025-drl-skillvrp-checkpoints.
To install the package, create a virtual environment and install the dependencies.
python3 -m venv .venv
source .venv/bin/activate
pip install -e .This will install the project in editable mode using the dependencies specified in pyproject.toml.
To generate instances, you must specify the parameters for instance generation using the --param_file argument, pointing to a YAML file in the params/ directory. For example:
python generate_data.py --param_file=params/n50.yamlBy default, this will generate feasible instances. You can specify the generator type using the --generator argument, e.g., --generator=feasible or --generator=random.
You can define additional parameter files in the params/ folder, which contains the YAML files specifying the parameters used for instance generation. Refer to the params/ directory for available parameter files and adjust them or create new ones as needed for your experiments.
Note: params/n50_infeasible.yaml is identical to params/n50 with exception of the save_path attribute for the final instance file. The instances designed to be hardly feasible were generated using skill_vrp/generator_infeasible.py directly.
Note: This process takes time, as instances are iteratively evaluated after initial generation and attribute shuffling to determine their feasibility using PyVRP.
To train a model with the default configuration, run:
python train.pyYou can specify additional Hydra configuration options via the command line. For example, to use the polynet experiment configuration in hydra_config/experiment/polynet.yaml:
python train.py experiment=polynetRefer to the hydra_configs/ directory for available configuration options.
Note: Per default the wandb logger is used, i.e., you need a wandb account and login via wandb login to use the logger. If you do not wish to use wandb, you can pass logger=none, which will override the callback with a minimal configuration and will work without wandb:
python train.py logger=noneAlternatively, you can define additional loggers by adjusting the Hydra configuration accordingly.
To reproduce the results from the paper, use the following command-line options (replace variables as needed):
python train.py \
experiment=$experiment \ # am, mvmoepomo, polynet, pomo, symnco
seed=33609 \
logger.wandb.project=$proj_name \ # individual naming of projects on wandb
model.dataloader_num_workers=2 \
trainer.max_epochs=$max_epochs \ # 100, 500
env.generator_params.num_loc=$num_loc \ # 20, 50
env.lambda_strategy=$lambda_strategy \ # 1, 2, 3
env.alpha=$alpha \ # 0.1, 0.2, 0.5
env.penalty_type=$penalty_type \ # 'num_routes', 'full_routes'
env.penalty=$penalty \ # 10, 100, 1000
env.train_file=$train_file \ # (see below)
experiment.naming_suffix=$suffix # for individual experiment namingRefer to the paper and configuration files for the specific values used in each experiment. The train files used lie in data/ folder and are n_20/train.npz, n_20/train_random.npz, n_50/train.npz, n_50/train_random.npz, n50_infeas/train_infeas.npz.
To evaluate trained models, use the test.py script. By default, it expects checkpoints to be located in the lion2025-drl-skillvrp-checkpoints/ directory, which you can download from the lion2025-drl-skillvrp-checkpoints repository. Place this folder in your project root.
Model metadata required for evaluation is read from evaluation/runs_metadata.json. If you wish to evaluate checkpoints stored elsewhere, specify the path using the appropriate argument when running test.py and ensure the metadata file is updated accordingly (e.g., by downloading run data per project from wandb) and saved in the evaluation/ folder, based on the checkpoint names.
Example usage:
python test.pyOr, specifically stating the arguments:
python test.py --checkpoint_dir=lion2025-drl-skillvrp-checkpoints --metadata_file=runs_metadata.jsonAdjust the pipeline as needed if your checkpoints or metadata are not in the default locations. Evaluation results will be written into results_best.txt and results_last.txt in the evaluation/ folder.
To run the baseline, additional packages are needed, which can be installed via:
pip install -e .[baselines]Note: For the Gurobi baseline to work, you must have Gurobi installed and properly licensed on your system. Installation instructions and troubleshooting can be found in the Gurobi documentation. If you encounter issues, ensure that the Gurobi Python package is installed and that your license is correctly configured.
To run one of the baselines (Gurobi or PyVRP), use the run_baseline.py script. Parameters --baseline and --filename are mandatory, where --baseline determines the baseline to run (gurobi or pyvrp) and --filename specifies the location of the test file to evaluate.
For example:
python run_baseline.py --baseline=pyvrp --filename=data/n_20/test.npzRefer to the script itself for further parameters and usage details.