[Feature] Support eval for C-Eval test split #2218

HYZ17 · 2025-07-24T08:47:45Z

Motivation

As the author of C-Eval benchmark, I have recently released the test split answers that were previously hidden. The current OpenCompass implementation only supports evaluation on the validation split because test set ground truth was not available. This PR aims to enable C-Eval test split evaluation to provide more comprehensive assessment capabilities for large language models.

The link for the C-Eval is https://huggingface.co/datasets/ceval/ceval-exam

Modification

This PR makes the following modifications to enable C-Eval test split evaluation:

Updated dataset configurations: Modified three C-Eval configuration files to include both 'val' and 'test' splits:
- opencompass/configs/datasets/ceval/ceval_gen_5f30c7.py
- opencompass/configs/datasets/ceval/ceval_ppl_578f8d.py
- opencompass/configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py
Changed split iteration: Updated the loop from for _split in ['val']: to for _split in ['val', 'test']: in all three configuration files.
Preserved existing functionality: The existing validation split evaluation remains unchanged, while adding new test split evaluation capabilities.

BC-breaking

This modification does not introduce breaking changes. The existing validation split evaluation functionality remains identical. The PR only adds new test split evaluation capabilities, so downstream projects can continue using C-Eval validation split without any code changes.

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

HYZ17 · 2025-07-24T08:48:45Z

To support the funtionality, the maintainer might also need to update the dataset hosted on Opencompass. Thanks a lot !

support eval for ceval test split

fb45d30

mm-assistant bot assigned acylam Jul 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support eval for C-Eval test split #2218

[Feature] Support eval for C-Eval test split #2218

Uh oh!

HYZ17 commented Jul 24, 2025

Uh oh!

HYZ17 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Support eval for C-Eval test split #2218

Are you sure you want to change the base?

[Feature] Support eval for C-Eval test split #2218

Uh oh!

Conversation

HYZ17 commented Jul 24, 2025

Motivation

Modification

BC-breaking

Checklist

Uh oh!

HYZ17 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants