Skip to content

Conversation

@AndSonder
Copy link
Contributor

PR Category

Auto Parallel

PR Types

Bug fixes

Description

Pcard-76459

PIR 下 amp_master_grad_cast op 放到 opt 阶段了,这就导致了目前 grad merge 的逻辑变成了

grad(bf16) => add(grad_merge(bf16), grad(bf16)) => cast(grad_merge(bf16) => opt(fp32)

这个是不合理的,梯度累加 需要在 高精度下进行

Copy link
Contributor

@JZ-LIANG JZ-LIANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JZ-LIANG JZ-LIANG merged commit 3a9f584 into PaddlePaddle:develop Sep 27, 2024
26 of 27 checks passed
@AndSonder AndSonder deleted the fix_grad_merge_2 branch September 30, 2024 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants