GitHub - TsingZ0/AP2O: Official implementation of our paper: AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization

Introduction

Adaptive Progressive Preference Optimization

Coding Error Reduction

Data Efficiency

Reward Curves

To initiate the preference data self-generation and preference optimization processes, use the following command:

sh pipe-qwen2.5-coder.sh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert_dpo_format.py		convert_dpo_format.py
convert_hf_dataset.py		convert_hf_dataset.py
delete_global_step_folders.py		delete_global_step_folders.py
ds_zero-0.json		ds_zero-0.json
ds_zero.json		ds_zero.json
infer.py		infer.py
merge.py		merge.py
pipe-qwen2.5-coder.sh		pipe-qwen2.5-coder.sh
pipelining.py		pipelining.py
train.py		train.py