You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🎲 Add support for additional generation kwargs in GRPO Trainer (#2989)
* Add support for additional generation kwargs in GRPO Trainer
- Extend GRPOConfig to support additional generation kwargs
- Update GRPOTrainer to incorporate additional generation parameters
- Add tests for training with additional generation kwargs for both standard and vLLM modes
* Add missing vllm_gpu_memory_utilization=0.5
* 🔧 Refactor GRPO generation parameters and configuration
- Restructure GRPOConfig to separate generation parameters
- Add support for top_p, top_k, min_p, repetition_penalty, and length_penalty
- Remove additional_generation_kwargs in favor of explicit parameters
- Update GRPOTrainer to use new generation parameter configuration
* Update tests
* Remove length_penalty and fix tests
* Update defaults and docs
- Change temperature type from Optional[float] to float
- Set default top_p to 1.0 instead of None
- Simplify parameter descriptions by removing redundant "if set to None" text
- Maintain consistent type hints and default values for generation parameters
* GRPO remove optional type hint for temperature parameter
* Remove length_penalty from sampling_kwargs dict in GRPOTrainer
* some refactoring
* top k None support
* change value of in test to amke them work
---------
Co-authored-by: Robert Veres <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
0 commit comments