添加数据集CARDBiomedBench #2071

bio-mlhui · 2025-05-02T13:01:30Z

添加CARDBiomedBench Benchmark评测 (1个子集+llmjudge)

Modification

包含2个文件:

datasets/CARDBiomedBench.py
configs/datasets/CARDBiomedBench/CARDBiomedBench_llmjudge_gen.py
该数据集有train/test/All 3个csv文件，目前只考虑了 All

        data_files = {'test': 'data/CARDBiomedBench.csv'}
        dataset = load_dataset(path, data_files=data_files, split='test')

Debug Result:

Qwen2.5-1.5B 作为测试模型，Qwen2.5-72b作为LLM Judge, 抽取200个sample进行测试:

Checklist

Before PR:

[✔ ] Pre-commit or other linting tools are used to fix the potential lint issues.
[ ✔] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
[✔ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

bio-mlhui · 2025-05-08T02:49:58Z

已经添加了commit id

MaiziXiao · 2025-05-08T03:49:20Z

opencompass/configs/datasets/CARDBiomedBench/CARDBiomedBench_llmjudge_gen_99a231.py

@@ -0,0 +1,101 @@
+from opencompass.datasets import CARDBiomedBenchDataset, CARDBiomedBench_llmjudge_postprocess


Judge Prompt 输出为 A|B的时候可以直接用from opencompass.datasets import generic_llmjudge_postprocess，可参考https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html#genericllmevaluator

bio-mlhui · 2025-05-08T04:24:34Z

修改了dataset-index.yml
替换掉了generic_llmjudge_postprocess

MaiziXiao

LGTM

MaiziXiao · 2025-05-08T05:54:08Z

Fix lint plz

bio-mlhui · 2025-05-08T11:01:30Z

已经修改了pre-commit的lint错误

* CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init

CARDBiomedBench

9db1fea

mm-assistant bot assigned tonysy May 2, 2025

fix hash

c66423f

fix dataset-index

b3aa62b

MaiziXiao reviewed May 8, 2025

View reviewed changes

bio-mlhui added 2 commits May 8, 2025 04:20

use official llmjudge postprocess

6ff36c1

use official llmjudge_postprocess

85ecf3c

MaiziXiao approved these changes May 8, 2025

View reviewed changes

bio-mlhui temporarily deployed to prod May 8, 2025 05:51 — with GitHub Actions Inactive

fix lint

3f2ce77

bio-mlhui temporarily deployed to prod May 8, 2025 11:04 — with GitHub Actions Inactive

fix init

1b5e467

bio-mlhui temporarily deployed to prod May 8, 2025 11:20 — with GitHub Actions Inactive

MaiziXiao merged commit a7f3ac2 into open-compass:main May 8, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

添加数据集CARDBiomedBench #2071

添加数据集CARDBiomedBench #2071

Uh oh!

bio-mlhui commented May 2, 2025 •

edited

Loading

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

MaiziXiao May 8, 2025

Uh oh!

bio-mlhui May 8, 2025

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

MaiziXiao left a comment

Uh oh!

MaiziXiao commented May 8, 2025

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,101 @@
		from opencompass.datasets import CARDBiomedBenchDataset, CARDBiomedBench_llmjudge_postprocess

添加数据集CARDBiomedBench #2071

添加数据集CARDBiomedBench #2071

Uh oh!

Conversation

bio-mlhui commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

添加CARDBiomedBench Benchmark评测 (1个子集+llmjudge)

Modification

Debug Result:

Checklist

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

MaiziXiao May 8, 2025

Choose a reason for hiding this comment

Uh oh!

bio-mlhui May 8, 2025

Choose a reason for hiding this comment

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

MaiziXiao left a comment

Choose a reason for hiding this comment

Uh oh!

MaiziXiao commented May 8, 2025

Uh oh!

bio-mlhui commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

bio-mlhui commented May 2, 2025 •

edited

Loading