Commit efa0114
🎬 Clip higher (huggingface#3118)
* epsilon range added
* epsilon doc str updated
* test removed
* pre-commit run
* Update trl/trainer/grpo_config.py
Co-authored-by: Quentin Gallouédec <[email protected]>
* upper epsilon updated
* precommit updates added
* minor format and dtype fixes
* moving upper bound computation in init
* hf.co for paper link
---------
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>1 parent d0da1a7 commit efa0114
2 files changed
+13
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
118 | 121 | | |
119 | 122 | | |
120 | 123 | | |
| |||
300 | 303 | | |
301 | 304 | | |
302 | 305 | | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
303 | 313 | | |
304 | 314 | | |
305 | 315 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
391 | | - | |
| 391 | + | |
| 392 | + | |
392 | 393 | | |
393 | 394 | | |
394 | 395 | | |
| |||
975 | 976 | | |
976 | 977 | | |
977 | 978 | | |
978 | | - | |
| 979 | + | |
979 | 980 | | |
980 | 981 | | |
981 | 982 | | |
| |||
0 commit comments