Skip to content

⚔️ Fix bf16 fp16 config conflict issue #3598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 20, 2025

Conversation

yao-matrix
Copy link
Contributor

@yao-matrix yao-matrix commented Jun 16, 2025

w/ this PR #3515, it by default set bf16 set to True regardless what fp16 setting is. In case where user set fp16 explicitly to True(as in ut case here https://github.com/huggingface/trl/blob/main/tests/slow/test_dpo_slow.py#L132), there will be error message "At most one of fp16 and bf16 can be True, but not both".

Fixing this issue by setting bf16 to True only when fp16 is False and bf16 is not set.

@qgallouedec , pls help review, thx.

Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
@qgallouedec
Copy link
Member

Yes thank you, this PR makes sense. By the way, that's why slow tests fail on main.

The only thing that bothers me is that None by default isn't very informative, maybe even misleading. For the moment, I don't really see an alternative.

@yao-matrix
Copy link
Contributor Author

Yes thank you, this PR makes sense. By the way, that's why slow tests fail on main.

The only thing that bothers me is that None by default isn't very informative, maybe even misleading. For the moment, I don't really see an alternative.

@qgallouedec Yes, since it's a post_init_, and we need to know whether the bf16 is set explicitly by users or just a automatic default, so the best way i can figure to set it as optional w/ default None. Yes, it's not informative, maybe we can enhance docstring to improve that?

Like "Whether to use bf16 (mixed) precision instead of 32-bit. Requires Ampere or higher NVIDIA "
"architecture or Intel XPU or using CPU (use_cpu) or Ascend NPU. If user doesn't set it explicitly and fp16 not set to True, we default set bf16 to True. This is an experimental API and it may change."

WDYT?

@qgallouedec
Copy link
Member

Yes, it's not informative, maybe we can enhance docstring to improve that?

Yes, I think that's the best way to go at this stage. I'm fine with what you suggested, just delete this

This is an experimental API and it may change.

@qgallouedec
Copy link
Member

qgallouedec commented Jun 19, 2025

I just realized that I asked you to remove something that was already in the doc. But still, can you remove it? I think bf16 is stable enough now

@qgallouedec qgallouedec changed the title fix bf16 fp16 config conflict issue ⚔️ Fix bf16 fp16 config conflict issue Jun 20, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@shirinyamani shirinyamani merged commit 4c92de0 into huggingface:main Jun 20, 2025
10 checks passed
@yao-matrix yao-matrix deleted the bf16-fp16-config branch June 23, 2025 23:58
marcandrelarochelle pushed a commit to marcandrelarochelle/trl that referenced this pull request Jul 29, 2025
Signed-off-by: YAO Matrix <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Quentin Gallouédec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants