-
Notifications
You must be signed in to change notification settings - Fork 672
添加数据集CARDBiomedBench #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
添加数据集CARDBiomedBench #2071
Conversation
已经添加了commit id |
@@ -0,0 +1,101 @@ | |||
from opencompass.datasets import CARDBiomedBenchDataset, CARDBiomedBench_llmjudge_postprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judge Prompt 输出为 A|B的时候可以直接用from opencompass.datasets import generic_llmjudge_postprocess
,可参考https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html#genericllmevaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix lint plz |
已经修改了pre-commit的lint错误 |
* CARDBiomedBench * fix hash * fix dataset-index * use official llmjudge postprocess * use official llmjudge_postprocess * fix lint * fix init
添加CARDBiomedBench Benchmark评测 (1个子集+llmjudge)
Modification
包含2个文件:
该数据集有train/test/All 3个csv文件,目前只考虑了 All
Debug Result:
Qwen2.5-1.5B 作为测试模型,Qwen2.5-72b作为LLM Judge, 抽取200个sample进行测试:
Checklist
Before PR:
After PR: