-
Notifications
You must be signed in to change notification settings - Fork 1
Add remote reward #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: run_sensecore_add_req_sched
Are you sure you want to change the base?
Conversation
# MODEL_NAME = "Qwen2.5-32B-Instruct" | ||
MODEL_NAME = "Qwen3-30B-A3B" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BASE_URL、API_KEY 和 MODEL_NAME 这三个加到训练的参数配置里面吧(给一个默认值),剩下的可以暂时写死
verl/utils/reward_score/__init__.py
Outdated
from . import remote_reward | ||
# 调用judge model | ||
# r"\\boxed\s*{([^}]*)}"匹配response中的pred 在路径/afs/chatrl/users/hwq/code/verl-req-sched/verl/utils/reward_score/remote_reward/__init__.py | ||
# 数据集需要在extra_info给出question |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么不可以直接从 messages 里面拿 prompt 呢
Output: | ||
'''.strip() | ||
|
||
SAVE_PATH = "/afs/chatrl/users/hwq/code/verl-remote-reward/examples_sensecore/grpo_remote_reward/output.jsonl" # 设置你要保存的 JSONL 文件路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAVE_PATH 也改成在 config 里配置路径然后读 config吧
completion = self.client.chat.completions.create( | ||
model=model_name, | ||
messages=messages, | ||
extra_body={"chat_template_kwargs": {"enable_thinking": False}}, # nothinking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个字段非 qwen 系列模型兼容吗
import vllm | ||
from transformers import AutoTokenizer | ||
|
||
from harpy.tools.base import Tool, SortTool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
harpy 这些没有用到的 Tool 可以都去掉,vllm 推 rm 应该暂时也用不到,也去掉吧,留下 remote rm 的部分就好
) | ||
|
||
|
||
def build_rm(model_name, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分也可以去掉
from verl.utils.reward_score.remote_reward.tools.base import Tool, GenerateTool | ||
|
||
|
||
class VllmTool(GenerateTool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote 的话,vllm 这个部分也可以去掉
Co-authored-by: Bihan Rana <[email protected]> Co-authored-by: peterschmidt85 <[email protected]>
…engine#2365) ### What does this PR do? Fix a regression from volcengine#1911, because the PR did not change the sglang async branch. CI did not catch this error because it only run 1 step, but this error happen in the second test. So I update the testcases to run 2 steps. To reproduce the bug, run test: TOTAL_TRAIN_STEPS=2 ENGINE=sglang ROLLOUT_MODE=async bash tests/special_e2e/ppo_trainer/run_function_reward.sh It fail with: ``` (WorkerDict pid=1257286) Total steps: 2, num_warmup_steps: 0 (WorkerDict pid=1257286) Actor use_remove_padding=True (WorkerDict pid=1257286) Actor use_fused_kernels=False (AsyncSglangServer pid=1260392) FastAPI listen on [192.168.111.48:40451](http://192.168.111.48:40451/) (WorkerDict pid=1257286) terminate called after throwing an instance of 'c10::Error' (WorkerDict pid=1257286) what(): CUDA error: an illegal memory access was encountered (WorkerDict pid=1257286) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (WorkerDict pid=1257286) For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (WorkerDict pid=1257286) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. (WorkerDict pid=1257286) (WorkerDict pid=1257286) Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first): (WorkerDict pid=1257286) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fbf6036c1b6 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fbf60315a76 in /usr/local/lib/python3.10/dist-packages/torch/lib/[libc10.so](http://libc10.so/)) (WorkerDict pid=1257286) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fbf6080d918 in ``` ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20an%20illegal%20memory%20access%20was%20encountered - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ``` (TaskRunner pid=1647269) step:2 - global_seqlen/min:13075 - global_seqlen/max:14837 - global_seqlen/minmax_diff:1762 - global_seqlen/balanced_min:14231 - global_seqlen/balanced_max:14232 - global_seqlen/mean:14231.5 - actor/entropy:2.0606913566589355 - critic/vf_loss:8.7157882153 ``` ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Checklist Before Starting
What does this PR do?
High-Level Design
Specific Changes
API
Usage Example
# Add code snippet or script demonstrating how to use this
Test
Additional Info.
Checklist Before Submitting
[BREAKING]
to the PR title if it breaks any API.