-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[algo] Add SVD-LoRA GRPO #3637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[algo] Add SVD-LoRA GRPO #3637
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an experimental implementation of SVD-LoRA-GRPO. The core logic is in the new verl/utils/experimental/svd_lora.py
file, with integrations in FSDP utilities and worker configurations. My review has identified a few critical issues. There's a TypeError
in SVDLinear.create_from_weight
that will cause a runtime crash. More importantly, the implementation in apply_svd_lora
seems to misinterpret the paper by performing SVD on LoRA's A and B matrices separately, instead of on their product W=BA
. I've also pointed out a high-severity issue in fsdp_utils.py
where reconstructed weights are not moved to the CPU, and a high-severity maintainability issue regarding the use of a deprecated PyTorch function. Please address these points to ensure the correctness and robustness of this new feature.
|
What does this PR do?
This PR implements the experimental SVD-LoRA-GRPO method derived by paper ESSA: Evolutionary Strategies for Scalable Alignment
Checklist Before Starting
[{modules}] {type}: {description}
(This will be checked by the CI){modules}
includefsdp
,megatron
,sglang
,vllm
,rollout
,trainer
,ci
,training_utils
,recipe
,hardware
,deployment
,ray
,worker
,single_controller
,misc
,perf
,model
,algo
,env
,tool
,ckpt
,doc
,data
,
like[megatron, fsdp, doc]
{type}
is infeat
,fix
,refactor
,chore
,test
[BREAKING]
to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batching
Test
API and Usage Example
# Add code snippet or script demonstrating how to use this
Design & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
ci-request
channel in theverl
Slack workspace. (If not accessible, please try the Feishu group (飞书群).)