Skip to content

Conversation

baberabb
Copy link
Contributor

No description provided.

@baberabb baberabb requested a review from StellaAthena as a code owner July 10, 2025 14:46
@baberabb baberabb merged commit fcddf19 into main Jul 10, 2025
6 checks passed
@baberabb baberabb deleted the ll branch July 10, 2025 14:53
HelloJocelynLu added a commit to deepprinciple/lm-evaluation-harness that referenced this pull request Jul 17, 2025
* warning for "chat" pretrained; disable buggy evalita configs (EleutherAI#3127)

* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test

* fix: remove warning (EleutherAI#3128)

* Adding EgyMMLU and EgyHellaSwag (EleutherAI#3063)

* add egy mmlu hellaswag

* add egymmlu egyhellaswag to tasks readme

* fix egymmlu config generation

* fix _generate_configs formating

* Added mixed_precision_dtype arg (EleutherAI#3138)

* Fix for hang due to mp.Pool in bootstrap_stderr (EleutherAI#3135)

* fix: vllm lora (EleutherAI#3132)

* truncate thinking tags in generations (EleutherAI#3145)

* feat: add postprocessing for generated text to strip stop sequences and thinking tokens

* nit

* fix: trim leading whitespace after stripping thinking tokens from generation

* feat: add think_end_token to model_args

* nit

* nit

* nit

* add to readme

* nit

* `bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (EleutherAI#3140)

* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.

* feat: remove extra space from answers; add changelog

---------

Co-authored-by: Baber <[email protected]>

---------

Co-authored-by: Baber Abbasi <[email protected]>
Co-authored-by: Atou Houdaifa <[email protected]>
Co-authored-by: Avelina Asada Hadji-Kyriacou <[email protected]>
Co-authored-by: Ankit Gola <[email protected]>
Co-authored-by: MaYongQing <[email protected]>
Co-authored-by: philipdoldo <[email protected]>
Co-authored-by: Baber <[email protected]>
HelloJocelynLu added a commit to deepprinciple/lm-evaluation-harness that referenced this pull request Jul 17, 2025
* warning for "chat" pretrained; disable buggy evalita configs (EleutherAI#3127)

* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test

* fix: remove warning (EleutherAI#3128)

* Adding EgyMMLU and EgyHellaSwag (EleutherAI#3063)

* add egy mmlu hellaswag

* add egymmlu egyhellaswag to tasks readme

* fix egymmlu config generation

* fix _generate_configs formating

* Added mixed_precision_dtype arg (EleutherAI#3138)

* Fix for hang due to mp.Pool in bootstrap_stderr (EleutherAI#3135)

* Add molecular reasoning evaluation configs and data

* [Example] Add Chemistry Task from Global-MMLU Dataset (#2)

* update multiple choice example mmlu

* increase max_tokens limit

* [Example] Add chemistry reasoning task with customized eval func (#5)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* Integrate Humanity's Last Exam (HLE) Benchmark (#7)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* update hle dataset-chemistry

* Close #8: Update Research Dataset for Optimizing Functional Transition Metal Complexes (#9)

* update multiple choice example mmlu

* increase max_tokens limit

* update example for nmr with customized eval func

* rename nmr yaml file

* add task tmc optimization "tmc_gap" and "tmc_polar"

* comment out some debugging message

* Edit function name

* Adding Groups for TMC opt tasks, and addding new evaluation metrix

* Variable Name correction

---------

Co-authored-by: Jieyu Lu <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: ZhangdeSong1 <“zhangdesong”@deepprinciple.com>

* Update issue templates

* Update issue templates

* Create pull_request_template.md

* Update pull_request_template.md

* Update new-task-addition.md

---------

Co-authored-by: Baber Abbasi <[email protected]>
Co-authored-by: Atou Houdaifa <[email protected]>
Co-authored-by: Avelina Asada Hadji-Kyriacou <[email protected]>
Co-authored-by: Ankit Gola <[email protected]>
Co-authored-by: Kehan Guo <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: ZhangdeSong1 <[email protected]>
Co-authored-by: Jieyu Lu <[email protected]>
Co-authored-by: ZhangdeSong1 <“zhangdesong”@deepprinciple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant