Add GRPO/ Online DPO support for quantitative models when use vllm as infer backbone. #3133
maoulee:main% was force-pushed and no longer has any new commits.
Pushing new commits will allow the pull request to be re-opened.
Pushing new commits will allow the pull request to be re-opened.