Skip to content

Conversation

@srishti-git1110
Copy link
Contributor

@srishti-git1110 srishti-git1110 commented Oct 2, 2025

  • adds a new environment for MedXpertQA
  • closes MedXpertQA #7
  • We only use the Text subset for now
  • Since there's only 5 rows in the dev split of the Text subset, the test set with 2.45k rows is used as both the train and test set while creating the env.
  • Testing: tested with the command below and i got correct outputs saved on my local.
vf-eval medxpertqa -m gpt-4.1-nano-2025-04-14 -n 2 -s

@CLAassistant
Copy link

CLAassistant commented Oct 2, 2025

CLA assistant check
All committers have signed the CLA.

@warner-benjamin
Copy link
Collaborator

Thanks for the PR.

We can't add the test set as the training set, so this should be a test only dataset.

The MedXpertQA authors have their prompts on GitHub, let's use theirs instead a custom one so our implementation of MedXpertQA is more comparable to the paper. OpenCompass also has their prompts.

Can you also add an option to their "Put your final answer within \boxed{{}}" prompt to use thinking tags per verifiers THINK_BOXED_SYSTEM_PROMPT from verifiers.utils.data_utils for reasoning models and verifiers.ThinkParser to extract the answers from a reasoning model? There's also extract_boxed_answer?

@srishti-git1110
Copy link
Contributor Author

Thanks for the review @warner-benjamin . Changes made:

  • changed the prompt
  • used extract_boxed_answer to retrieve the final answer.

However, I couldn't understand if extraction/parsing using verifiers.ThinkParser.parse() is still required when we're already doing extract_boxed_answer for ans extraction. Please lmk and I'll revise.

@warner-benjamin
Copy link
Collaborator

@srishti-git1110 The tags should only be used if use_think is passed to the environment, otherwise we want the "Let's think step by step" prompt. That is when we want to include the ThinkParser. You can see an example of this in #19.

@srishti-git1110
Copy link
Contributor Author

Thank you @warner-benjamin. Does it look fine now? Happy to revise more as needed. :)

@warner-benjamin
Copy link
Collaborator

The environment didn't run and errored out on the answer parsing. I fixed it and made a few other changes.

@warner-benjamin warner-benjamin merged commit 03ab8ad into MedARC-AI:main Oct 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MedXpertQA

3 participants