Skip to content

Conversation

Rijul-Tandon
Copy link

@Rijul-Tandon Rijul-Tandon commented Oct 9, 2025

Description

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the tests accordingly (if applicable).
  • I have updated the documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
  • I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
  • I have performed RLops with python -m openrlbenchmark.rlops.
    • For new feature or bug fix:
      • I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
    • For new algorithm:
      • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
    • I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

- Add c51_expected_sarsa.py: ES-C51 implementation for classic control environments
- Add c51_atari_expected_sarsa.py: ES-C51 implementation for Atari environments
- Modify c51.py: Update to use softmax action selection for fair comparison (QL-C51)
- Modify c51_atari.py: Update to use softmax action selection for fair comparison (QL-C51)
- Add ES_C51_README.md: Comprehensive documentation for ES-C51 algorithm

ES-C51 uses Expected SARSA updates with softmax-weighted expectation over all possible
next actions, instead of greedy Q-learning updates. This provides better exploration
and more stable learning in distributional RL settings.

Based on paper: 'ES-C51: Expected Sarsa Based C51 Distributional Reinforcement Learning Algorithm'
Authors: Rijul Tandon, Peter Vamplew, Cameron Foale (Neurocomputing, 2024)
- Auto-format all ES-C51 files to meet CleanRL code standards
- Ensures pre-commit checks pass for consistent code style
Copy link

vercel bot commented Oct 9, 2025

@Rijul-Tandon is attempting to deploy a commit to the Costa Huang's projects Team on Vercel.

A member of the Team first needs to authorize it.

@Rijul-Tandon
Copy link
Author

Summary

This PR adds the ES-C51 (Expected SARSA based C51) distributional reinforcement learning algorithm to CleanRL.

Changes Made

  • Add c51_expected_sarsa.py: ES-C51 implementation for classic control environments
  • Add c51_atari_expected_sarsa.py: ES-C51 implementation for Atari environments
  • Modify c51.py: Updated to use softmax action selection for fair comparison (QL-C51)
  • Modify c51_atari.py: Updated to use softmax action selection for fair comparison (QL-C51)
  • Add ES_C51_README.md: Comprehensive documentation for ES-C51 algorithm

Key Innovation

ES-C51 uses Expected SARSA updates with softmax-weighted expectation over all possible next actions, instead of greedy Q-learning updates. This provides better exploration and more stable learning in distributional RL settings.

Research Paper

Based on: "ES-C51: Expected Sarsa Based C51 Distributional Reinforcement Learning Algorithm"
Authors: Rijul Tandon, Peter Vamplew, Cameron Foale (Neurocomputing, 2024)

Testing Status

  • ✅ All pre-commit checks pass (black, isort formatting applied)
  • ✅ Updated tau decay schedule (0.75 multiplier for improved exploration)
  • ✅ No merge conflicts

Updated README to reflect ES-C51 algorithm details and removed references to CleanRL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant