We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 96fa6d9 commit a06fe9fCopy full SHA for a06fe9f
examples/summarize_rlhf/README.md
@@ -40,7 +40,7 @@ For an in-depth description of the example, please refer to our [blog post](http
40
41
### Results
42
43
-On 1,000 samples from CNN/DailyMail test dataset:
+The following tables display ROUGE and reward scores on the test set of the TL;DR dataset between SFT and PPO models.
44
45
1. SFT vs PPO
46
0 commit comments