On the Implementation of REINFORCE_rej

Hi,

Congratulations on your excellent work!
I have been carefully reading your paper and studying the code, and I greatly appreciate your contributions.

While reviewing the implementation of REINFORCE_rej, I was surprised to find that the rewards for positive and negative samples are set to 0 and 1, instead of -1 and 1 as stated in the paper. This appears to result in REINFORCE_rej effectively ignoring all negative samples during training. I find this quite confusing and would greatly appreciate your clarification on this point.

Best regards,
Siheng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the Implementation of REINFORCE_rej #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On the Implementation of REINFORCE_rej #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions