Skip to content

Conversation

HYZ17
Copy link

@HYZ17 HYZ17 commented Jul 24, 2025

Motivation

As the author of C-Eval benchmark, I have recently released the test split answers that were previously hidden. The current OpenCompass implementation only supports evaluation on the validation split because test set ground truth was not available. This PR aims to enable C-Eval test split evaluation to provide more comprehensive assessment capabilities for large language models.

The link for the C-Eval is https://huggingface.co/datasets/ceval/ceval-exam

Modification

This PR makes the following modifications to enable C-Eval test split evaluation:

  1. Updated dataset configurations: Modified three C-Eval configuration files to include both 'val' and 'test' splits:
    - opencompass/configs/datasets/ceval/ceval_gen_5f30c7.py
    - opencompass/configs/datasets/ceval/ceval_ppl_578f8d.py
    - opencompass/configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py
  2. Changed split iteration: Updated the loop from for _split in ['val']: to for _split in ['val', 'test']: in all three configuration files.
  3. Preserved existing functionality: The existing validation split evaluation remains unchanged, while adding new test split evaluation capabilities.

BC-breaking

This modification does not introduce breaking changes. The existing validation split evaluation functionality remains identical. The PR only adds new test split evaluation capabilities, so downstream projects can continue using C-Eval validation split without any code changes.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

@HYZ17
Copy link
Author

HYZ17 commented Jul 24, 2025

To support the funtionality, the maintainer might also need to update the dataset hosted on Opencompass. Thanks a lot !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants