docs: Update README.md #1091

euronymous-aithal · 2025-09-07T21:04:30Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.
updating readme

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Overhauled README with a new Overview clarifying capabilities, scalability, and multimodal focus.
- Added collapsible "Previous News" block and redesigned navigation with Quick Start and Support Matrix.
- Reworked Features list to reflect new models, algorithms, and core capabilities.
- Added explicit Training and Generation Backends guidance and auto-selection notes.
- Added an expanded Quick Start with side-by-side backend commands, detailed prerequisites, environment/build guidance, and container/runtime notes.

Signed-off-by: Ashwath Aithal <[email protected]>

added table for Algorithms Signed-off-by: Ashwath Aithal <[email protected]>

added extra features for 0.4 Signed-off-by: Ashwath Aithal <[email protected]>

Signed-off-by: Ashwath Aithal <[email protected]>

README.md

terrykong · 2025-09-08T18:54:29Z

@euronymous-aithal did you want to fold any of this PR into yours? https://github.com/NVIDIA-NeMo/RL/pull/965/files

Co-authored-by: Parth Chadha <[email protected]> Signed-off-by: Ashwath Aithal <[email protected]>

added chnages from https://github.com/NVIDIA-NeMo/RL/pull/965/files Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal · 2025-09-09T17:29:15Z

@terrykong i folded changes from "https://github.com/NVIDIA-NeMo/RL/pull/965/files" to the current PR. please review and let me know

terrykong · 2025-09-09T19:00:38Z

lgtm. thanks @euronymous-aithal

Signed-off-by: Ashwath Aithal <[email protected]>

coderabbitai · 2025-09-09T19:49:43Z

Walkthrough

README.md was extensively rewritten: reorganized sections, expanded feature list, added backend overviews, detailed prerequisites and Quick Start (comparing Native PyTorch vs Megatron Core), updated installation/run guidance, and reworded model/inference notes. No code or API changes.

Changes

Cohort / File(s)	Summary of Changes
README restructure `README.md`	Major documentation overhaul: new Overview; replaced TOC with collapsible Previous News; expanded Features; explicit Training and Generation Backends sections; comprehensive Quick Start with backend comparison and concrete commands; detailed Prerequisites and setup (submodules, CUDA/flash-attn, uv environment); revised installation/runtime guidance and wording across public-facing docs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly and accurately reflects the primary change in this pull request, which is updating the README.md documentation, without introducing extraneous details or misleading information.
Description Check	✅ Passed	The description clearly indicates that the PR’s goal is to update the README, matching the actual changes, and is therefore on-topic even if it remains brief.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Poem

I nibble docs beneath the moonlit screen,
Two pathways gleam where tangled bits had been.
Quick-start carrots, backend burrows bright,
Submodules snug, environments set right.
A hop, a note — the README's clean and keen. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch euronymous-aithal-patch-1

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fixed a typo Signed-off-by: Ashwath Aithal <[email protected]>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

README.md (3)
537-538: Likely filename typo in Tips.

Script name elsewhere is run_grpo_math.py.
-  NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo.py ...
+  NRL_FORCE_REBUILD_VENVS=true uv run examples/run_grpo_math.py ...
542-556: Fix PyTorch CUDA docs URL and keep consistent terminology.

Use official domain path.

“FlashAttention 2” branding.
-  To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
+  To do so, specify [`max_split_size_mb`](https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
@@
-    # ...
+    # ...
     dtensor_cfg:
       env_vars:
         PYTORCH_CUDA_ALLOC_CONF: "max_split_size_mb:64"
517-520: Align link text with target doc title

Change the link text from “Cluster Start” to “Set Up Clusters” in README.md so it matches the H1 in docs/cluster.md.

🧹 Nitpick comments (14)

README.md (14)

1-1: Use consistent brand casing: “NeMo RL” in title.

Replace “Nemo RL” with “NeMo RL”.

-# Nemo RL: A Scalable and Efficient Post-Training Library
+# NeMo RL: A Scalable and Efficient Post-Training Library

19-26: Grammar/brand fixes in Overview bullets.

Add space before parentheses, use “NeMo RL”, “PyTorch”, and tighten wording.

-**Nemo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for Multimodal models(LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
+**NeMo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
@@
-- **Hackable** with native Pytorch only paths for quick research prototypes.
+- **Hackable** with PyTorch-only paths for quick research prototypes.
-- **High-performance with Megatron Core**, supporting various parallelism techniques for large models and large context lengths.
+- **High-performance with Megatron Core**, supporting various parallelism techniques for large models and long context lengths.

34-35: PyTorch casing and clarity.

Use “PyTorch” and consider expanding acronyms at first use.

-- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (Pytorch native TP, SP, PP, CP, FSDP2)
+- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (native TP, SP, PP, CP, FSDP2)

51-56: Typos/branding in “Coming in v0.4” items.

“megatron” → “Megatron”
“GPRO” → “GRPO”
Prefer “Megatron Core” (no hyphen).

-- 🔜 **Megatron Inference** - Megatron Inference for fast day-0 support for new megatron models (avoid weight conversion).
+- 🔜 **Megatron Inference** - Megatron Inference for fast day-0 support for new Megatron models (avoid weight conversion).
-# ...
-- 🔜 **Async RL** - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GPRO.
+- 🔜 **Async RL** - Support for asynchronous rollouts and replay buffers for off-policy training, and enable a fully asynchronous GRPO.
-- 🔜 **End-to-end FP8 Low Precision training** - Support for Megatron-core FP8 training and FP8 VLLM generation.
+- 🔜 **End-to-end FP8 low-precision training** - Support for Megatron Core FP8 training and FP8 vLLM generation.

66-71: Punctuation and spacing fixes.

Tighten commas and spacing; keep branding consistent.

-- ✅ **Learning Algorithms** - GRPO/GSPO , SFT , and DPO.
+- ✅ **Learning Algorithms** - GRPO/GSPO, SFT, and DPO.
-- ✅ **(even) Larger Model Support with Long(er) Sequences** - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP).
+- ✅ **(Even) Larger Model Support with Long(er) Sequences** - Performant parallelisms with Megatron Core (TP/PP/CP/SP/EP/FSDP).
-- ✅ **MoE Models** - Support for DeepseekV3 and Qwen-3 MoE models (Megatron)
+- ✅ **MoE Models** - Support for DeepSeek-V3 and Qwen-3 MoE models (Megatron).

75-99: Fix Table of Contents list style/indentation to satisfy markdownlint.

Current ToC uses indented “-” bullets and mixed HTML, triggering MD004/MD007. Convert to top-level “*” bullets with proper nesting.

-## Table of Contents
-  - [Prerequisites](#prerequisites)
-  - [Quick Start](#quick-start)
-  - Support Matrix
-
-    <p></p>
-    
-    |Algorithms|Single Node|Multi-node|
-    |-|-|-|
-    |[GRPO](#grpo)|[GRPO Single Node](#grpo-single-node)|[GRPO Multi-node](#grpo-multi-node): [GRPO Qwen2.5-32B](#grpo-qwen25-32b), [GRPO Multi-Turn](#grpo-multi-turn)|
-    |[Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft)|[SFT Single Node](#sft-single-node)|[SFT Multi-node](#sft-multi-node)|
-    |[DPO](#dpo)|[DPO Single Node](#dpo-single-node)|[DPO Multi-node](#dpo-multi-node)|
-    |[RM](#rm)|[RM Single Node](#rm-single-node)|[RM Multi-node](#rm-multi-node)|
-
-    <p></p>
-
-  - [Evaluation](#evaluation)
-    - [Convert Model Format (Optional)](#convert-model-format-optional)
-    - [Run Evaluation](#run-evaluation)
-  - [Set Up Clusters](#set-up-clusters)
-  - [Tips and Tricks](#tips-and-tricks)
-  - [Citation](#citation)
-  - [Contributing](#contributing)
-  - [Licenses](#licenses)
+## Table of Contents
+* [Prerequisites](#prerequisites)
+* [Quick Start](#quick-start)
+* Support Matrix
+
+  | Algorithms | Single Node | Multi-node |
+  |---|---|---|
+  | [GRPO](#grpo) | [GRPO Single Node](#grpo-single-node) | [GRPO Multi-node](#grpo-multi-node): [GRPO Qwen2.5-32B](#grpo-qwen25-32b), [GRPO Multi-Turn](#grpo-multi-turn) |
+  | [Supervised Fine-Tuning (SFT)](#supervised-fine-tuning-sft) | [SFT Single Node](#sft-single-node) | [SFT Multi-node](#sft-multi-node) |
+  | [DPO](#dpo) | [DPO Single Node](#dpo-single-node) | [DPO Multi-node](#dpo-multi-node) |
+  | [RM](#rm) | [RM Single Node](#rm-single-node) | [RM Multi-node](#rm-multi-node) |
+
+* [Evaluation](#evaluation)
+  * [Convert Model Format (Optional)](#convert-model-format-optional)
+  * [Run Evaluation](#run-evaluation)
+* [Set Up Clusters](#set-up-clusters)
+* [Tips and Tricks](#tips-and-tricks)
+* [Citation](#citation)
+* [Contributing](#contributing)
+* [Licenses](#licenses)

If markdownlint is configured to enforce “*” (MD004), this change will silence the warnings. Otherwise, consider relaxing the rule in your lint config.

102-107: PyTorch casing in Quick Start description.

Use “PyTorch”.

-Use this quick start to get going with either the Native PyTorch DTensor or Megatron-Core training backends. 
+Use this quick start to get going with either the native PyTorch DTensor or Megatron Core training backends.

154-156: PyTorch casing.

-# by running (This is not necessary if you are using the pure Pytorch/DTensor path):
+# by running (This is not necessary if you are using the pure PyTorch/DTensor path):

202-202: Fix typo.

“sucessful” → “successful”.

-If sucessful, you should see `✅ flash-attn successfully added to uv cache`.
+If successful, you should see `✅ flash-attn successfully added to uv cache`.

240-245: Minor consistency in model naming.

Optional: keep Meta model IDs consistent with HF slugs.

-  logger.wandb.name="grpo-llama1b_math" \
+  logger.wandb.name="grpo-llama-1b_math" \

528-533: Fix typo: “initialize”.

-  and then force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run:
+  and then force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run:

Also earlier in the paragraph:

-  If you see this error, there is likely an issue with your virtual environments. To fix this, first intialize the submodules:
+  If you see this error, there is likely an issue with your virtual environments. To fix this, first initialize the submodules:

313-336: SFT section brand casing and clarity.

“Llama3.2-1B” → “Llama-3.2-1B”.
“using a 1 GPU” → “using 1 GPU”.

-This fine-tunes the `Llama3.2-1B` model on the SQuAD dataset using a 1 GPU.
+This fine-tunes the `Llama-3.2-1B` model on the SQuAD dataset using 1 GPU.

460-516: Evaluation section: PyTorch casing and small edits.

Use “PyTorch”.

-If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:
+If you have trained a model and saved the checkpoint in the PyTorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:

75-88: markdownlint warnings (MD004/MD007).

CI may fail with current list style/indentation. Either adopt “*” bullets with proper indent (see ToC diff) or relax the rule in your markdownlint config.

Also applies to: 91-99

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 266e718 and 6e91de4.

📒 Files selected for processing (1)

README.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

23-23: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

24-24: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

25-25: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

26-26: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

27-27: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

34-34: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

35-35: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

42-42: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

43-43: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

51-51: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

52-52: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

53-53: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

54-54: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

55-55: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

56-56: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

57-57: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

58-58: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

59-59: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

60-60: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

61-61: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

62-62: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

63-63: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

64-64: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

70-70: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

71-71: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

72-72: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

73-73: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

76-76: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

76-76: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

77-77: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

77-77: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

78-78: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

78-78: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

91-91: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

91-91: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

92-92: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

92-92: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

93-93: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

93-93: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

94-94: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

94-94: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

95-95: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

95-95: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

96-96: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

96-96: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

97-97: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

97-97: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

98-98: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

98-98: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-container / main

🔇 Additional comments (4)

README.md (4)

42-45: No change requested; link text matches intended wording.

The Megatron inference bullet is clear and correctly linked.

198-201: External docs link formatting.

The docs URL should be plain Markdown. Current text is fine; no change required if the hosted page is correct.

Please confirm https://docs.nvidia.com/nemo/rl/latest/docker.html is valid for v0.3.0+ containers.

257-258: Cross-reference is helpful.

Good pointer back to Training Backends docs.

3-16: Nice addition: collapsible “Previous News”.

Improves top-of-page scannability without losing history.

README.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ashwath Aithal <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

README.md (6)

19-19: Branding and terminology consistency (NeMo, PyTorch, Megatron Core).

Unify product names and hyphenation.

-**Nemo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for Multimodal models(LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
+**NeMo RL** is an open-source post-training library developed by NVIDIA, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs, etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.

-- **Hackable** with native Pytorch only paths for quick research prototypes.
+- **Hackable** with native PyTorch-only paths for quick research prototypes.

-- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (Pytorch native TP, SP, PP, CP, FSDP2)
+- **DTensor** - PyTorch's next-generation distributed training with improved memory efficiency (PyTorch-native TP, SP, PP, CP, FSDP2)

-- 🔜 **Improved Large MoE Performance** - Improve Megatron-core training performance and generation performance.
+- 🔜 **Improved Large MoE Performance** - Improve Megatron Core training performance and generation performance.

-- 🔜 **Megatron-Bridge Integration** - Integrate Megatron-Bridge to enable training features from Megatron-Core.
+- 🔜 **Megatron-Bridge Integration** - Integrate Megatron-Bridge to enable training features from Megatron Core.

-If you have trained a model and saved the checkpoint in the Pytorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:
+If you have trained a model and saved the checkpoint in the PyTorch DCP format, you first need to convert it to the Hugging Face format before running evaluation:

Also applies to: 24-24, 33-33, 55-55, 57-57, 464-464

541-541: Fix broken PyTorch docs link.

Current URL has an extra “docs/”. Use the canonical docs path.

-  To do so, specify [`max_split_size_mb`](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
+  To do so, specify [`max_split_size_mb`](https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)

137-140: Standardize uv invocation to include python (avoid PATH/shebang pitfalls).

Keeps examples consistent with earlier blocks and avoids execution issues if scripts aren’t executable.

-        <pre style="white-space:pre-wrap; word-break:break-word; overflow-wrap:anywhere;"><code class="language-sh">uv run examples/run_grpo_math.py &#92;
---config examples/configs/grpo_math_1B_megatron.yaml</code></pre>
+        <pre style="white-space:pre-wrap; word-break:break-word; overflow-wrap:anywhere;"><code class="language-sh">uv run python examples/run_grpo_math.py &#92;
+--config examples/configs/grpo_math_1B_megatron.yaml</code></pre>

5-6: Fix nested list indentation under News.

Align with common Markdown style and markdownlint MD007 (2-space indent).

-    * 📝 [v0.3.0 Blog Post](https://nvidia-nemo.github.io/blog/2025/07/21/nemo-rl-v0.3/)
-    * 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.
+  * 📝 [v0.3.0 Blog Post](https://nvidia-nemo.github.io/blog/2025/07/21/nemo-rl-v0.3/)
+  * 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.

213-213: Correct dataset name in link text.

Match the official dataset name to the linked URL.

-We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenInstructMath2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset.
+We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset.

48-61: markdownlint MD004/MD007 noise — pick a style or configure the linter.

Either switch list markers to “*” and 2-space indents or relax MD004/MD007 via repo config to accept “-”. Given the length of this README, a config might be preferable.

Example .markdownlint.json:

{
  "MD004": { "style": "dash" },
  "MD007": { "indent": 2 }
}

Also applies to: 62-72

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e91de4 and 391f71d.

📒 Files selected for processing (1)

README.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

23-23: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

24-24: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

25-25: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

26-26: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

27-27: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

33-33: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

34-34: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

41-41: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

42-42: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

50-50: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

51-51: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

52-52: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

53-53: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

54-54: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

55-55: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

56-56: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

57-57: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

58-58: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

59-59: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

60-60: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

61-61: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

62-62: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

63-63: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

64-64: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

70-70: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

71-71: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

74-74: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

74-74: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

75-75: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

75-75: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

76-76: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

76-76: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

89-89: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

89-89: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

90-90: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

90-90: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

91-91: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

91-91: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

92-92: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

92-92: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

93-93: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

93-93: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

94-94: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

94-94: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

95-95: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

95-95: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

96-96: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

96-96: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

README.md

Update README.md

2c9b0c9

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal requested review from terrykong, parthchadha and snowmanwwg September 7, 2025 21:04

euronymous-aithal temporarily deployed to nemo-ci September 7, 2025 21:04 — with GitHub Actions Inactive

terrykong previously approved these changes Sep 7, 2025

View reviewed changes

euronymous-aithal marked this pull request as ready for review September 7, 2025 21:07

euronymous-aithal temporarily deployed to nemo-ci September 7, 2025 21:09 — with GitHub Actions Inactive

Update README.md

cadcf66

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal dismissed terrykong’s stale review via cadcf66 September 8, 2025 03:35

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 03:35 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 03:40 — with GitHub Actions Inactive

Update README.md

0961ed0

added table for Algorithms Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal had a problem deploying to nemo-ci September 8, 2025 04:12 — with GitHub Actions Error

Update README.md

c2b7c1a

added extra features for 0.4 Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 04:19 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 04:26 — with GitHub Actions Inactive

Update README.md

6fcaef9

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 05:46 — with GitHub Actions Inactive

euronymous-aithal had a problem deploying to nemo-ci September 8, 2025 05:50 — with GitHub Actions Error

Update README.md

e38fb36

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 05:54 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 05:58 — with GitHub Actions Inactive

Update README.md

30c3a0b

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal had a problem deploying to nemo-ci September 8, 2025 06:06 — with GitHub Actions Error

Update README.md

de9ec07

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 06:08 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 06:16 — with GitHub Actions Inactive

parthchadha requested changes Sep 8, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

euronymous-aithal and others added 2 commits September 8, 2025 16:56

Update README.md

e4e22fd

Co-authored-by: Parth Chadha <[email protected]> Signed-off-by: Ashwath Aithal <[email protected]>

Update README.md

4f22841

Co-authored-by: Parth Chadha <[email protected]> Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 8, 2025 23:57 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 00:01 — with GitHub Actions Inactive

Update README.md

3661f56

added chnages from https://github.com/NVIDIA-NeMo/RL/pull/965/files Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 03:31 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 04:02 — with GitHub Actions Inactive

terrykong changed the title ~~Update README.md~~ doc: Update README.md Sep 9, 2025

terrykong changed the title ~~doc: Update README.md~~ docs: Update README.md Sep 9, 2025

Update README.md

50777dc

Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal had a problem deploying to nemo-ci September 9, 2025 19:49 — with GitHub Actions Error

Update README.md

6e91de4

fixed a typo Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 19:51 — with GitHub Actions Inactive

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 19:56 — with GitHub Actions Inactive

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

Update README.md

374decb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal had a problem deploying to nemo-ci September 9, 2025 20:30 — with GitHub Actions Error

Update README.md

391f71d

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Ashwath Aithal <[email protected]>

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 20:32 — with GitHub Actions Inactive

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

README.md Show resolved Hide resolved

euronymous-aithal temporarily deployed to nemo-ci September 9, 2025 20:55 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Update README.md #1091

docs: Update README.md #1091

Uh oh!

euronymous-aithal commented Sep 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

Uh oh!

terrykong commented Sep 8, 2025

Uh oh!

euronymous-aithal commented Sep 9, 2025

Uh oh!

terrykong commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

docs: Update README.md #1091

Are you sure you want to change the base?

docs: Update README.md #1091

Uh oh!

Conversation

euronymous-aithal commented Sep 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

terrykong commented Sep 8, 2025

Uh oh!

euronymous-aithal commented Sep 9, 2025

Uh oh!

terrykong commented Sep 9, 2025

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks (3 passed)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

euronymous-aithal commented Sep 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading