Skip to content

Conversation

Avelina9X
Copy link
Contributor

This PR adds a new mixed_precision_dtype: Optional[Union[str, torch.dtype]] parameter to HFML.

When specified (either as a string or torch.dtype) all internal calls to self.model will occur inside torch.autocast regions for the model's device and given dtype.

  • Scope: impacts self._model_call and self._model_generate which affects loglikelihood, loglikelihood_rolling, generate_until and automatic batch size detection.
  • Default: None -> original behaviour (the torch.autocast context manager will become a no op).

The addition of this feature results in slightly different behaviour to explicitly loading the model with a specified dtype as it relies on torch's internal op-selection and autocasting behaviour. This is primarily useful when working with multi-dtype models (e.g. VLMs, PEFT models) in the CLI or when invoking the harness from the python API during training where a model may be in full precision.

Usage

Python API

lm = HFLM(model_name, mixed_precision_dtype="float16")   # or torch.float16

CLI

lm_eval --model hf \
    --model_args pretrained=model_name,dtype="float32",mixed_precision_dtype="float16"

Although any torch.dtype is accepted, in practice "float16" and "bfloat16" are the only sensible choices when utilising this feature.

Docs
No documentation has been added as the HFLM class does not have specific documentation, and the argument name and typing is self explanatory.

@baberabb
Copy link
Contributor

Great work! Merging!

@baberabb baberabb merged commit 31895e5 into EleutherAI:main Jul 14, 2025
6 checks passed
HelloJocelynLu added a commit to deepprinciple/lm-evaluation-harness that referenced this pull request Jul 17, 2025
* warning for "chat" pretrained; disable buggy evalita configs (EleutherAI#3127)

* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test

* fix: remove warning (EleutherAI#3128)

* Adding EgyMMLU and EgyHellaSwag (EleutherAI#3063)

* add egy mmlu hellaswag

* add egymmlu egyhellaswag to tasks readme

* fix egymmlu config generation

* fix _generate_configs formating

* Added mixed_precision_dtype arg (EleutherAI#3138)

* Fix for hang due to mp.Pool in bootstrap_stderr (EleutherAI#3135)

* fix: vllm lora (EleutherAI#3132)

* truncate thinking tags in generations (EleutherAI#3145)

* feat: add postprocessing for generated text to strip stop sequences and thinking tokens

* nit

* fix: trim leading whitespace after stripping thinking tokens from generation

* feat: add think_end_token to model_args

* nit

* nit

* nit

* add to readme

* nit

* `bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (EleutherAI#3140)

* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.

* feat: remove extra space from answers; add changelog

---------

Co-authored-by: Baber <[email protected]>

---------

Co-authored-by: Baber Abbasi <[email protected]>
Co-authored-by: Atou Houdaifa <[email protected]>
Co-authored-by: Avelina Asada Hadji-Kyriacou <[email protected]>
Co-authored-by: Ankit Gola <[email protected]>
Co-authored-by: MaYongQing <[email protected]>
Co-authored-by: philipdoldo <[email protected]>
Co-authored-by: Baber <[email protected]>
HelloJocelynLu added a commit to deepprinciple/lm-evaluation-harness that referenced this pull request Jul 17, 2025
* warning for "chat" pretrained; disable buggy evalita configs (EleutherAI#3127)

* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test

* fix: remove warning (EleutherAI#3128)

* Adding EgyMMLU and EgyHellaSwag (EleutherAI#3063)

* add egy mmlu hellaswag

* add egymmlu egyhellaswag to tasks readme

* fix egymmlu config generation

* fix _generate_configs formating

* Added mixed_precision_dtype arg (EleutherAI#3138)

* Fix for hang due to mp.Pool in bootstrap_stderr (EleutherAI#3135)

* Add molecular reasoning evaluation configs and data

* [Example] Add Chemistry Task from Global-MMLU Dataset (#2)

* update multiple choice example mmlu

* increase max_tokens limit

* [Example] Add chemistry reasoning task with customized eval func (#5)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* Integrate Humanity's Last Exam (HLE) Benchmark (#7)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* update hle dataset-chemistry

* Close #8: Update Research Dataset for Optimizing Functional Transition Metal Complexes (#9)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* add task tmc optimization "tmc_gap" and "tmc_polar"

* comment out some debugging message

* Edit function name

* Adding Groups for TMC opt tasks, and addding new evaluation metrix

* Variable Name correction

---------

Co-authored-by: Jieyu Lu <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: ZhangdeSong1 <“zhangdesong”@deepprinciple.com>

* Update issue templates

* Update issue templates

* Create pull_request_template.md

* Update pull_request_template.md

* Update new-task-addition.md

---------

Co-authored-by: Baber Abbasi <[email protected]>
Co-authored-by: Atou Houdaifa <[email protected]>
Co-authored-by: Avelina Asada Hadji-Kyriacou <[email protected]>
Co-authored-by: Ankit Gola <[email protected]>
Co-authored-by: Kehan Guo <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: ZhangdeSong1 <[email protected]>
Co-authored-by: Jieyu Lu <[email protected]>
Co-authored-by: ZhangdeSong1 <“zhangdesong”@deepprinciple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants