Skip to content

Commit 5c04ddc

Browse files
authored
[doc] fix: formatting issue for kl_ctrl and fused_kernel_options configs (volcengine#3917)
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. <img width="1295" height="152" alt="image" src="https://github.com/user-attachments/assets/3c9de645-da2a-43ea-93fc-2a41480091bd" /> <img width="1301" height="100" alt="image" src="https://github.com/user-attachments/assets/a71259ce-6e26-45ee-8387-6c2a3d0ab412" /> Add line breaks before and after the second level list. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <[email protected]>
1 parent ecdaa8d commit 5c04ddc

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

docs/examples/config.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -237,11 +237,13 @@ Actor/Rollout/Reference Policy
237237
- ``actor_rollout_ref.model.use_fused_kernels``: Whether to use fused
238238
kernels in the model. If set to True, the following parameters will be
239239
used.
240+
240241
- ``actor_rollout_ref.model.fused_kernel_options.impl_backend``: The
241-
implementation backend for fused kernels. Options: "triton" or
242-
"torch". Default is "torch".
243-
While in megatron, we only support "triton" as the
244-
implementation backend, so there is no need for this option.
242+
implementation backend for fused kernels. Options: "triton" or
243+
"torch". Default is "torch".
244+
While in megatron, we only support "triton" as the
245+
implementation backend, so there is no need for this option.
246+
245247
- ``actor_rollout_ref.model.use_remove_padding``: Whether to use remove
246248
padding in the model. If set to True, the model will remove padding
247249
tokens in the input_ids and response_ids. This helps a lot in improving model running efficiency.
@@ -529,9 +531,11 @@ Algorithm
529531
calculate the kl divergence between actor and reference policy. For
530532
specific options, refer to `kl_penalty()` in `core_algos.py <https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/core_algos.py>`_ .
531533
- ``kl_ctrl``: Config for in-reward kl_penalty controller
534+
532535
- ``kl_coef``: The (initial) coefficient of in-reward kl_penalty. Default is 0.001.
533536
- ``type``: 'fixed' for FixedKLController and 'adaptive' for AdaptiveKLController.
534537
- ``horizon`` and ``target_kl``: See source code of AdaptiveKLController for details.
538+
535539
- ``rollout_is``: Whether to enable rollout importance sampling correction. Default is False.
536540
- ``rollout_is_threshold``: Upper threshold for IS weights. Set to ``null`` to disable IS completely.
537541
- ``rollout_is_threshold_lower``: Lower threshold for IS weights. If ``null``, defaults to reciprocal of upper (1/upper).

0 commit comments

Comments
 (0)