nanoAhaMoment: Single File "RL for LLM" Library

Amirhossein Kazemnejad*, Milad Aghajohari*, Alessandro Sordoni, Aaron Courville, Siva Reddy

Implementation of DeepSeek R1-zero style training with:

Single 80G GPU (and also multi-GPU)
No RL Library
3B Base Model (and also 7B models with multi-GPU)
Full Parameter Tuning
Efficient (Competetive performance to verl but much simpler)
Up to 32K context size for 3B model with multi-GPU (or 16K context size for 7B model)

News

June 2025: Added multi-GPU support for faster training and 7B models
June 2025: Added VinePPO episode generation (experimental)

Inspired by TinyZero and Mini-R1, but designed to be much simpler, cleaner, and faster, with every line of code visible and understandable.

Karpathy-style Detailed Lecture on YouTube

File Descriptions

nano_r1.ipynb is the interactive single file jupyter notebook with tutorial.
nano_r1_script.py is also just the nano_r1.ipynb but for convenience of running with python and multi-GPU support.
notebooks/checkpoint_playground.ipynb is a notebook for comparing different model checkpoints (including our trained model) and playing with them.
🤗 McGill-NLP/nano-aha-moment-3b: The HF model trained using the above script (~60% Accuracy on CountDown Task)

Setup Instructions

Clone the repository

git clone https://github.com/McGill-NLP/nano-aha-moment.git

Install dependencies
First, make sure cuda 12.4 is installed.

Install PyTorch:

pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Install the rest of the dependencies:

pip install -r requirements.txt

Alternative Installation with uv (Optional)

uv sync
uv sync --extra compile  # Install flash-attention

Run the training script
Open nano_r1.ipynb or nano_r1_script.py and start training.

If using uv, you can run with either uv run nano_r1_script.py or activate the env with source .venv/bin/activate and run with python nano_r1_script.py

Multi-GPU Training

Here is the command to run the training script with 4 GPUs:

python nano_r1_script.py --nproc 4  # Use 4 GPUs

Batch Sizes for different context lengths

Context Length	3B Model (per_device_batch_size)	7B Model (per_device_batch_size)
1024	32	16
2048	16	8
4K	8	4
8K	4	2
16K	2	1
32K	1	N/A

Note: These batch sizes are optimized for 4xA100 80GB GPUs. For other GPU types, you may need to adjust the batch sizes accordingly.

Todos

Full evaluation suite
Multi-GPU support (Added June 2025)

Acknowledgement

We gratefully acknowledge the support of Lambda for providing compute resources through their research compute grant.

Citation

If you use this codebase in your research, please cite us using:

@misc{Kazemnejad2025:NanoAhaMoment,
  author       = {Amirhossein Kazemnejad and Milad Aghajohari and Alessandro Sordoni and Aaron Courville and Siva Reddy},
  title        = {Nano Aha! Moment: Single File "RL for LLM" Library},
  year         = {2025},
  howpublished = {\url{https://github.com/McGill-NLP/nano-aha-moment}},
  note         = {GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
assets		assets
notebooks		notebooks
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
nano_r1.ipynb		nano_r1.ipynb
nano_r1_script.py		nano_r1_script.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nanoAhaMoment: Single File "RL for LLM" Library

News

Karpathy-style Detailed Lecture on YouTube

File Descriptions

Setup Instructions

Multi-GPU Training

Batch Sizes for different context lengths

Todos

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

McGill-NLP/nano-aha-moment

Folders and files

Latest commit

History

Repository files navigation

nanoAhaMoment: Single File "RL for LLM" Library

News

Karpathy-style Detailed Lecture on YouTube

File Descriptions

Setup Instructions

Multi-GPU Training

Batch Sizes for different context lengths

Todos

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages