-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Hi,
Congratulations on your excellent work!
I have been carefully reading your paper and studying the code, and I greatly appreciate your contributions.
While reviewing the implementation of REINFORCE_rej, I was surprised to find that the rewards for positive and negative samples are set to 0 and 1, instead of -1 and 1 as stated in the paper. This appears to result in REINFORCE_rej effectively ignoring all negative samples during training. I find this quite confusing and would greatly appreciate your clarification on this point.
Best regards,
Siheng
Metadata
Metadata
Assignees
Labels
No labels