What‘s the meaning of clip_ratio in GRPO Trainer?

I am trying to understand the rationale behind the calculation of `clip_ratio` in the following code snippet:
```python
coef_1 = torch.exp(per_token_logps - old_per_token_logps)
coef_2 = torch.clamp(coef_1, 1 - self.epsilon_low, 1 + self.epsilon_high)
per_token_loss1 = coef_1 * advantages.unsqueeze(1)
per_token_loss2 = coef_2 * advantages.unsqueeze(1)
per_token_loss = -torch.min(per_token_loss1, per_token_loss2)

is_clipped = (per_token_loss1 < per_token_loss2).float()
clip_ratio = (is_clipped * completion_mask).sum() / completion_mask.sum()
self._metrics[mode]["clip_ratio"].append(self.accelerator.gather_for_metrics(clip_ratio).mean().item())
```
If `clip_ratio` is intended to indicate how frequently the policy updates are constrained to prevent large changes, shouldn't the `is_clipped` be:
```python
is_clipped = (per_token_loss1 > per_token_loss2).float()
```
since we are using `torch.min(per_token_loss1, per_token_loss2)`.

I would appreciate any insights or clarification on this matter. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What‘s the meaning of clip_ratio in GRPO Trainer? #3144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What‘s the meaning of clip_ratio in GRPO Trainer? #3144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions