Skip to content

Conversation

bio-mlhui
Copy link
Contributor

@bio-mlhui bio-mlhui commented May 2, 2025

添加CARDBiomedBench Benchmark评测 (1个子集+llmjudge)

Modification

包含2个文件:

  1. datasets/CARDBiomedBench.py
  2. configs/datasets/CARDBiomedBench/CARDBiomedBench_llmjudge_gen.py
    该数据集有train/test/All 3个csv文件,目前只考虑了 All
        data_files = {'test': 'data/CARDBiomedBench.csv'}
        dataset = load_dataset(path, data_files=data_files, split='test')

Debug Result:

Qwen2.5-1.5B 作为测试模型,Qwen2.5-72b作为LLM Judge, 抽取200个sample进行测试:

image

Checklist

Before PR:

  • [✔ ] Pre-commit or other linting tools are used to fix the potential lint issues.
  • [ ✔] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [✔ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

@bio-mlhui
Copy link
Contributor Author

已经添加了commit id

@@ -0,0 +1,101 @@
from opencompass.datasets import CARDBiomedBenchDataset, CARDBiomedBench_llmjudge_postprocess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judge Prompt 输出为 A|B的时候可以直接用from opencompass.datasets import generic_llmjudge_postprocess,可参考https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html#genericllmevaluator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@bio-mlhui
Copy link
Contributor Author

  1. 修改了dataset-index.yml
  2. 替换掉了generic_llmjudge_postprocess

Copy link
Contributor

@MaiziXiao MaiziXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaiziXiao
Copy link
Contributor

Fix lint plz

@bio-mlhui
Copy link
Contributor Author

已经修改了pre-commit的lint错误

@MaiziXiao MaiziXiao merged commit a7f3ac2 into open-compass:main May 8, 2025
6 of 8 checks passed
stephen-nju pushed a commit to stephen-nju/opencompass that referenced this pull request May 14, 2025
* CARDBiomedBench

* fix hash

* fix dataset-index

* use official llmjudge postprocess

* use official llmjudge_postprocess

* fix lint

* fix init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants