-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[trainer] feat: Upstream Dynamic Sampling #2988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Hecate0821
wants to merge
78
commits into
volcengine:main
Choose a base branch
from
PrinsYin:ds_nokl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 69 commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
cf00afa
Added dynamic filter
PrinsYin d800627
Addedrunner
PrinsYin 9bb815f
dynamic filter
PrinsYin f5ffc02
dynamic filter no kl
PrinsYin 20a822c
extra info logic update
PrinsYin 78b44f7
logging and run script
PrinsYin ecf1c21
added metric train reward
PrinsYin 72eb9a9
1
PrinsYin 82fb1ae
1
PrinsYin 1ce98dd
1
PrinsYin df7f92d
Merge mainline
Hecate0821 c33d341
Add script for Qwen3-4b dapo
Hecate0821 e57fb25
Clean Up
Hecate0821 9644809
Fix Pre-commit
Hecate0821 8a4ba8f
Fix: Comments
Hecate0821 a2bfbbf
Fix:naming
Hecate0821 e539fde
merge main
Hecate0821 2ac9fcb
Merge branch 'main' into ds_nokl
zhaochenyang20 8442353
Add filter for all negative and positive
PrinsYin a527b77
Fix pre-commit
Hecate0821 d26af35
Fix script
Hecate0821 d4eefac
Fix script
Hecate0821 55d87ad
Add config in megatron yaml
Hecate0821 4aa7087
Fix config
Hecate0821 ddfdd8f
Fix pre-commit
PrinsYin 3f478c9
Fix CI
PrinsYin 235c12d
1
PrinsYin 09d60cb
1
PrinsYin 4f46906
1
PrinsYin 7df117b
refactor
PrinsYin a8692b2
refactor
PrinsYin 1ae4dbe
refactor
PrinsYin cd14168
refactor
PrinsYin 19c4fd3
Fix script
Hecate0821 6c96fe6
script
PrinsYin 7ff23e9
script
PrinsYin a9ae9e6
Make filter a class
Hecate0821 703f783
Fix config
Hecate0821 d4aeb94
Merge branch 'main' into ds_nokl
zhaochenyang20 09a58a5
moved filter
PrinsYin d45d5b6
Fix traienr
Hecate0821 fc935ff
fixed improt
PrinsYin 1adbf67
Fix add back init
Hecate0821 c582114
Fix naming
Hecate0821 ef493a5
Fix precommit
Hecate0821 5977e2a
Update verl/trainer/ppo/ray_trainer.py
Hecate0821 f65470b
Update verl/trainer/ppo/ray_trainer.py
Hecate0821 0ffa3bc
Fix: Centrelize settings
Hecate0821 1e9f49f
Merge branch 'ds_nokl' of https://github.com/PrinsYin/verl into ds_nokl
Hecate0821 23bdc68
Fix: concentrate the reward logic
Hecate0821 64ae27f
Fix: make _extract_reward_extra_infos utility function
Hecate0821 8a72b98
Fix: extract utility function
Hecate0821 f1e1de2
Fix: Clean
Hecate0821 9af93ef
Fix: Clean
Hecate0821 40cd018
Fix: Add credit
Hecate0821 fbb5020
Fix: Clean
Hecate0821 d371c97
Fix: Pre-commit
Hecate0821 ed1520d
Fix: response mask
Hecate0821 cdc4eb0
Fix: Licence
Hecate0821 aca70ba
1
PrinsYin 42ac13a
1
PrinsYin 7409e30
fix
PrinsYin 36fc97b
Update verl/trainer/ppo/ray_trainer.py
PrinsYin da4ab8b
resolve comment
PrinsYin cc34f16
fix extra info
PrinsYin 08f4c04
fix extra info
PrinsYin 3ab4752
Fix pre-commit
Hecate0821 31887c4
Fix pre-commit
Hecate0821 2115f70
Merge branch 'main' into ds_nokl
zhaochenyang20 8c74805
Fix naming
Hecate0821 cd5093b
Fix comments
Hecate0821 c5778c1
Fix repeat traj bug and zip issue
Hecate0821 809d37a
Fix redundant if
Hecate0821 1f928e4
1
PrinsYin e603155
1
PrinsYin c1ca9c1
1
PrinsYin 8346efb
Merge branch 'volcengine:main' into ds_nokl
zhaochenyang20 9617d09
Merge branch 'main' into ds_nokl
zhaochenyang20 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Format checks enforced on CI: | ||
# 1. Comments must appear above each field. | ||
# 2. There must be a blank line between each field. | ||
# 3. Inline comments (after a field on the same line) are not allowed. | ||
# 4. Indentation level is respected for nested fields. | ||
|
||
# Dynamic filter for DAPO: filters out homogeneous groups, keeps diverse responses | ||
|
||
# Required when using verl.utils.omega_conf_to_dataclass to instantiate dataclass configs | ||
_target_: verl.trainer.config.FilterGroupsConfig | ||
|
||
# Whether to enable dynamic filter | ||
enable: False | ||
|
||
# Metric to use for dynamic filter: currently only "seq_reward" is supported | ||
metric: seq_reward | ||
|
||
# Maximum number of backfill attempts when collecting diverse responses | ||
# If set to 0 or negative, allows unlimited backfill attempts (use with caution) | ||
max_num_gen_batches: 10 | ||
|
||
# Default filter function for mixed reward filtering | ||
filter_function: verl.utils.filtering.dynamic_filtering.keep_mixed_reward | ||
|
||
# Additional arguments for the filter function | ||
filter_kwargs: {} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,8 @@ defaults: | |
- critic@critic: megatron_critic | ||
# Reward model config. | ||
- reward_model@reward_model: megatron_reward_model | ||
# Algorithm filter groups config. | ||
- algorithm/[email protected]_groups | ||
- _self_ | ||
|
||
actor_rollout_ref: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,9 @@ defaults: | |
# Reward model config. | ||
- reward_model@reward_model: dp_reward_model | ||
|
||
# Algorithm filter groups config. | ||
- algorithm/[email protected]_groups | ||
|
||
# load the reference default config, then apply the fields in the current yaml | ||
# self config override anything above | ||
- _self_ | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hecate0821 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hecate0821 marked this conversation as resolved.
Show resolved
Hide resolved
Hecate0821 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Copyright 2024 Bytedance Ltd. and/or its affiliates | ||
# Copyright 2023-2024 SGLang Team | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# Reference: | ||
# - DAPO: An Open-Source LLM Reinforcement Learning System at Scale | ||
# Paper: https://arxiv.org/abs/2503.14476 | ||
# - This implementation references the ReTool implementation: recipe/retool/ in VERL codebase | ||
|
||
from .dynamic_filtering import DynamicFilter, keep_mixed_reward | ||
|
||
__all__ = ["DynamicFilter", "keep_mixed_reward"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.