Skip to content

Conversation

euronymous-aithal
Copy link

@euronymous-aithal euronymous-aithal commented Sep 7, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.
updating readme

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Documentation
    • Overhauled README with a new Overview clarifying capabilities, scalability, and multimodal focus.
    • Added collapsible "Previous News" block and redesigned navigation with Quick Start and Support Matrix.
    • Reworked Features list to reflect new models, algorithms, and core capabilities.
    • Added explicit Training and Generation Backends guidance and auto-selection notes.
    • Added an expanded Quick Start with side-by-side backend commands, detailed prerequisites, environment/build guidance, and container/runtime notes.

Signed-off-by: Ashwath Aithal <[email protected]>
terrykong
terrykong previously approved these changes Sep 7, 2025
@euronymous-aithal euronymous-aithal marked this pull request as ready for review September 7, 2025 21:07
Signed-off-by: Ashwath Aithal <[email protected]>
added table for Algorithms 

Signed-off-by: Ashwath Aithal <[email protected]>
added extra features for 0.4

Signed-off-by: Ashwath Aithal <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
@terrykong
Copy link
Contributor

@euronymous-aithal did you want to fold any of this PR into yours? https://github.com/NVIDIA-NeMo/RL/pull/965/files

euronymous-aithal and others added 2 commits September 8, 2025 16:56
Co-authored-by: Parth Chadha <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
Co-authored-by: Parth Chadha <[email protected]>
Signed-off-by: Ashwath Aithal <[email protected]>
@euronymous-aithal
Copy link
Author

@terrykong i folded changes from "https://github.com/NVIDIA-NeMo/RL/pull/965/files" to the current PR. please review and let me know

@terrykong terrykong changed the title Update README.md doc: Update README.md Sep 9, 2025
@terrykong
Copy link
Contributor

lgtm. thanks @euronymous-aithal

@terrykong terrykong changed the title doc: Update README.md docs: Update README.md Sep 9, 2025
Signed-off-by: Ashwath Aithal <[email protected]>
Copy link

coderabbitai bot commented Sep 9, 2025

Walkthrough

README.md was extensively rewritten: reorganized sections, expanded feature list, added backend overviews, detailed prerequisites and Quick Start (comparing Native PyTorch vs Megatron Core), updated installation/run guidance, and reworded model/inference notes. No code or API changes.

Changes

Cohort / File(s) Summary of Changes
README restructure
README.md
Major documentation overhaul: new Overview; replaced TOC with collapsible Previous News; expanded Features; explicit Training and Generation Backends sections; comprehensive Quick Start with backend comparison and concrete commands; detailed Prerequisites and setup (submodules, CUDA/flash-attn, uv environment); revised installation/runtime guidance and wording across public-facing docs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately reflects the primary change in this pull request, which is updating the README.md documentation, without introducing extraneous details or misleading information.
Description Check ✅ Passed The description clearly indicates that the PR’s goal is to update the README, matching the actual changes, and is therefore on-topic even if it remains brief.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

I nibble docs beneath the moonlit screen,
Two pathways gleam where tangled bits had been.
Quick-start carrots, backend burrows bright,
Submodules snug, environments set right.
A hop, a note — the README's clean and keen. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch euronymous-aithal-patch-1

Comment @coderabbitai help to get the list of available commands and usage tips.

fixed a typo

Signed-off-by: Ashwath Aithal <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
README.md (3)

537-538: Likely filename typo in Tips.

Script name elsewhere is run_grpo_math.py.

-  NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo.py ...
+  NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo_math.py ...

542-556: Fix PyTorch CUDA docs URL and keep consistent terminology.

  • Use official domain path.
  • “FlashAttention 2” branding.
-  To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
+  To do so, specify [`max_split_size_mb`](https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
@@
-    # ...
+    # ...
     dtensor_cfg:
       env_vars:
         PYTORCH_CUDA_ALLOC_CONF: "max_split_size_mb:64"

517-520: Align link text with target doc title

Change the link text from “Cluster Start” to “Set Up Clusters” in README.md so it matches the H1 in docs/cluster.md.

🧹 Nitpick comments (14)
README.md (14)

1-1: Use consistent brand casing: “NeMo RL” in title.

Replace “Nemo RL” with “NeMo RL”.

-# Nemo RL: A Scalable and Efficient Post-Training Library
+# NeMo RL: A Scalable and Efficient Post-Training Library

19-26: Grammar/brand fixes in Overview bullets.

  • Add space before parentheses, use “NeMo RL”, “PyTorch”, and tighten wording.
-**Nemo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for Multimodal models(LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
+**NeMo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
@@
-- **Hackable** with native Pytorch only paths for quick research prototypes.
+- **Hackable** with PyTorch-only paths for quick research prototypes.
-- **High-performance with Megatron Core**, supporting various parallelism techniques for large models and large context lengths.
+- **High-performance with Megatron Core**, supporting various parallelism techniques for large models and long context lengths.

34-35: PyTorch casing and clarity.

Use “PyTorch” and consider expanding acronyms at first use.

-- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (Pytorch native TP, SP, PP, CP, FSDP2)
+- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (native TP, SP, PP, CP, FSDP2)

51-56: Typos/branding in “Coming in v0.4” items.

  • “megatron” → “Megatron”
  • “GPRO” → “GRPO”
  • Prefer “Megatron Core” (no hyphen).
-- 🔜 **Megatron Inference** - Megatron Inference for fast day-0 support for new megatron models (avoid weight conversion).
+- 🔜 **Megatron Inference** - Megatron Inference for fast day-0 support for new Megatron models (avoid weight conversion).
-# ...
-- 🔜 **Async RL** - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GPRO.
+- 🔜 **Async RL** - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GRPO.
-- 🔜 **End-to-end FP8 Low Precision training** - Support for Megatron-core FP8 training and FP8 VLLM generation.
+- 🔜 **End-to-end FP8 low-precision training** - Support for Megatron Core FP8 training and FP8 vLLM generation.

66-71: Punctuation and spacing fixes.

Tighten commas and spacing; keep branding consistent.

-- ✅ **Learning Algorithms** - GRPO/GSPO , SFT , and DPO.
+- ✅ **Learning Algorithms** - GRPO/GSPO, SFT, and DPO.
-- ✅ **(even) Larger Model Support with Long(er) Sequences** - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP).
+- ✅ **(Even) Larger Model Support with Long(er) Sequences** - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP).
-- ✅ **MoE Models** - Support for DeepseekV3 and Qwen-3 MoE models (Megatron)
+- ✅ **MoE Models** - Support for DeepSeek-V3 and Qwen-3 MoE models (Megatron).

75-99: Fix Table of Contents list style/indentation to satisfy markdownlint.

Current ToC uses indented “-” bullets and mixed HTML, triggering MD004/MD007. Convert to top-level “*” bullets with proper nesting.

-## Table of Contents
-  - [Prerequisites](#prerequisites)
-  - [Quick Start](#quick-start)
-  - Support Matrix
-
-    <p></p>
-    
-    |Algorithms|Single Node|Multi-node|
-    |-|-|-|
-    |[GRPO](#grpo)|[GRPO Single Node](#grpo-single-node)|[GRPO Multi-node](#grpo-multi-node): [GRPO Qwen2.5-32B](#grpo-qwen25-32b), [GRPO Multi-Turn](#grpo-multi-turn)|
-    |[Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft)|[SFT Single Node](#sft-single-node)|[SFT Multi-node](#sft-multi-node)|
-    |[DPO](#dpo)|[DPO Single Node](#dpo-single-node)|[DPO Multi-node](#dpo-multi-node)|
-    |[RM](#rm)|[RM Single Node](#rm-single-node)|[RM Multi-node](#rm-multi-node)|
-
-    <p></p>
-
-  - [Evaluation](#evaluation)
-    - [Convert Model Format (Optional)](#convert-model-format-optional)
-    - [Run Evaluation](#run-evaluation)
-  - [Set Up Clusters](#set-up-clusters)
-  - [Tips and Tricks](#tips-and-tricks)
-  - [Citation](#citation)
-  - [Contributing](#contributing)
-  - [Licenses](#licenses)
+## Table of Contents
+* [Prerequisites](#prerequisites)
+* [Quick Start](#quick-start)
+* Support Matrix
+
+  | Algorithms | Single Node | Multi-node |
+  |---|---|---|
+  | [GRPO](#grpo) | [GRPO Single Node](#grpo-single-node) | [GRPO Multi-node](#grpo-multi-node): [GRPO Qwen2.5-32B](#grpo-qwen25-32b), [GRPO Multi-Turn](#grpo-multi-turn) |
+  | [Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft) | [SFT Single Node](#sft-single-node) | [SFT Multi-node](#sft-multi-node) |
+  | [DPO](#dpo) | [DPO Single Node](#dpo-single-node) | [DPO Multi-node](#dpo-multi-node) |
+  | [RM](#rm) | [RM Single Node](#rm-single-node) | [RM Multi-node](#rm-multi-node) |
+
+* [Evaluation](#evaluation)
+  * [Convert Model Format (Optional)](#convert-model-format-optional)
+  * [Run Evaluation](#run-evaluation)
+* [Set Up Clusters](#set-up-clusters)
+* [Tips and Tricks](#tips-and-tricks)
+* [Citation](#citation)
+* [Contributing](#contributing)
+* [Licenses](#licenses)

If markdownlint is configured to enforce “*” (MD004), this change will silence the warnings. Otherwise, consider relaxing the rule in your lint config.


102-107: PyTorch casing in Quick Start description.

Use “PyTorch”.

-Use this quick start to get going with either the Native PyTorch DTensor or Megatron-Core training backends. 
+Use this quick start to get going with either the native PyTorch DTensor or Megatron Core training backends.

154-156: PyTorch casing.

-# by running (This is not necessary if you are using the pure Pytorch/DTensor path):
+# by running (This is not necessary if you are using the pure PyTorch/DTensor path):

202-202: Fix typo.

“sucessful” → “successful”.

-If sucessful, you should see `✅ flash-attn successfully added to uv cache`.
+If successful, you should see `✅ flash-attn successfully added to uv cache`.

240-245: Minor consistency in model naming.

Optional: keep Meta model IDs consistent with HF slugs.

-  logger.wandb.name="grpo-llama1b_math" \
+  logger.wandb.name="grpo-llama-1b_math" \

528-533: Fix typo: “initialize”.

-  and then force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run:
+  and then force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run:

Also earlier in the paragraph:

-  If you see this error, there is likely an issue with your virtual environments. To fix this, first intialize the submodules:
+  If you see this error, there is likely an issue with your virtual environments. To fix this, first initialize the submodules:

313-336: SFT section brand casing and clarity.

  • “Llama3.2-1B” → “Llama-3.2-1B”.
  • “using a 1 GPU” → “using 1 GPU”.
-This fine-tunes the `Llama3.2-1B` model on the SQuAD dataset using a 1 GPU.
+This fine-tunes the `Llama-3.2-1B` model on the SQuAD dataset using 1 GPU.

460-516: Evaluation section: PyTorch casing and small edits.

Use “PyTorch”.

-If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:
+If you have trained a model and saved the checkpoint in the PyTorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:

75-88: markdownlint warnings (MD004/MD007).

CI may fail with current list style/indentation. Either adopt “*” bullets with proper indent (see ToC diff) or relax the rule in your markdownlint config.

Also applies to: 91-99

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 266e718 and 6e91de4.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


23-23: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


24-24: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


25-25: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


26-26: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


27-27: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


34-34: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


35-35: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


42-42: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


43-43: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


51-51: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


52-52: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


53-53: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


54-54: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


55-55: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


56-56: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


57-57: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


58-58: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


59-59: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


60-60: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


61-61: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


62-62: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


63-63: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


64-64: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


70-70: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


71-71: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


72-72: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


73-73: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


76-76: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


76-76: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


77-77: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


77-77: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


78-78: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


78-78: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


91-91: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


91-91: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


92-92: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


92-92: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


93-93: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


93-93: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


94-94: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


94-94: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


95-95: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


95-95: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


96-96: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


96-96: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


97-97: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


97-97: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


98-98: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


98-98: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-container / main
🔇 Additional comments (4)
README.md (4)

42-45: No change requested; link text matches intended wording.

The Megatron inference bullet is clear and correctly linked.


198-201: External docs link formatting.

The docs URL should be plain Markdown. Current text is fine; no change required if the hosted page is correct.

Please confirm https://docs.nvidia.com/nemo/rl/latest/docker.html is valid for v0.3.0+ containers.


257-258: Cross-reference is helpful.

Good pointer back to Training Backends docs.


3-16: Nice addition: collapsible “Previous News”.

Improves top-of-page scannability without losing history.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ashwath Aithal <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Ashwath Aithal <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
README.md (6)

19-19: Branding and terminology consistency (NeMo, PyTorch, Megatron Core).

Unify product names and hyphenation.

-**Nemo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for Multimodal models(LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
+**NeMo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
-- **Hackable** with native Pytorch only paths for quick research prototypes.
+- **Hackable** with native PyTorch-only paths for quick research prototypes.
-- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (Pytorch native TP, SP, PP, CP, FSDP2)
+- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (PyTorch-native TP, SP, PP, CP, FSDP2)
-- 🔜 **Improved Large MoE Performance** - Improve Megatron-core training performance and generation performance.
+- 🔜 **Improved Large MoE Performance** - Improve Megatron Core training performance and generation performance.
-- 🔜 **Megatron-Bridge Integration** - Integrate Megatron-Bridge to enable training features from Megatron-Core.
+- 🔜 **Megatron-Bridge Integration** - Integrate Megatron-Bridge to enable training features from Megatron Core.
-If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:
+If you have trained a model and saved the checkpoint in the PyTorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:

Also applies to: 24-24, 33-33, 55-55, 57-57, 464-464


541-541: Fix broken PyTorch docs link.

Current URL has an extra “docs/”. Use the canonical docs path.

-  To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
+  To do so, specify [`max_split_size_mb`](https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)

137-140: Standardize uv invocation to include python (avoid PATH/shebang pitfalls).

Keeps examples consistent with earlier blocks and avoids execution issues if scripts aren’t executable.

-        <pre style="white-space:pre-wrap; word-break:break-word; overflow-wrap:anywhere;"><code class="language-sh">uv run examples/run_grpo_math.py &#92;
---config examples/configs/grpo_math_1B_megatron.yaml</code></pre>
+        <pre style="white-space:pre-wrap; word-break:break-word; overflow-wrap:anywhere;"><code class="language-sh">uv run python examples/run_grpo_math.py &#92;
+--config examples/configs/grpo_math_1B_megatron.yaml</code></pre>

5-6: Fix nested list indentation under News.

Align with common Markdown style and markdownlint MD007 (2-space indent).

-    * 📝 [v0.3.0 Blog Post](https://nvidia-nemo.github.io/blog/2025/07/21/nemo-rl-v0.3/)
-    * 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.
+  * 📝 [v0.3.0 Blog Post](https://nvidia-nemo.github.io/blog/2025/07/21/nemo-rl-v0.3/)
+  * 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.

213-213: Correct dataset name in link text.

Match the official dataset name to the linked URL.

-We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenInstructMath2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset.
+We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset.

48-61: markdownlint MD004/MD007 noise — pick a style or configure the linter.

Either switch list markers to “*” and 2-space indents or relax MD004/MD007 via repo config to accept “-”. Given the length of this README, a config might be preferable.

Example .markdownlint.json:

{
  "MD004": { "style": "dash" },
  "MD007": { "indent": 2 }
}

Also applies to: 62-72

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e91de4 and 391f71d.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


23-23: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


24-24: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


25-25: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


26-26: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


27-27: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


33-33: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


34-34: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


41-41: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


42-42: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


50-50: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


51-51: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


52-52: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


53-53: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


54-54: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


55-55: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


56-56: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


57-57: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


58-58: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


59-59: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


60-60: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


61-61: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


62-62: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


63-63: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


64-64: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


70-70: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


71-71: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


74-74: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


74-74: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


75-75: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


75-75: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


76-76: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


76-76: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


89-89: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


89-89: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


90-90: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


90-90: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


91-91: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


91-91: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


92-92: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


92-92: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


93-93: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


93-93: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


94-94: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


94-94: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


95-95: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


95-95: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


96-96: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


96-96: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants