Skip to content

Conversation

aw632
Copy link
Contributor

@aw632 aw632 commented Aug 26, 2024

Summary of the changes:

ID Reason for Change
simple_5 Allowing 'all' roots as a valid answer, as it includes real roots
simple_96 Clarification needed in the prompt
simple_238 Expanding acceptable answers to include all years of the Civil War (1861-1865)
simple_122 Mismatch between specification (array of integers) and ground truth (array of array of integers)
simple_309 Adding Tampa Bay Buccaneers to ground truth due to Tom Brady's 2020 NFL season
simple_235 Including Lisbon, Portugal as an acceptable answer
simple_267 Accepting "New York" as per the location parameter in the prompt
simple_308 Addressing grammar issues in the prompt
simple_316 Adding "female" as an acceptable answer for Serena Williams
simple_375 Removing LA from ground truth as it's not mentioned in the prompt
simple_379 Including "Australia/Sydney" as a valid timezone identifier
multiple_88 Accepting PC as a platform for the WOW game
multiple_119 Same issue as simple_96, needs clarification in the prompt
multiple_153 Same issue as simple_235, including Lisbon, Portugal as an acceptable answer

This PR does change the leaderboard values and will be updated in a separate PR.

@aw632 aw632 force-pushed the aw632/fix_test_cases branch from 5ad4947 to ccd28be Compare August 26, 2024 18:46
Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aw632,
Thanks for the PR!

Regarding simple_5: The question did not specify which kind of roots it is looking for, so the default value "real" should be used. Thus, the possible answer should be ["", "real"].
Same thing with multiple_88.

The rest looks good to me.

@aw632
Copy link
Contributor Author

aw632 commented Aug 27, 2024

Hi @aw632, Thanks for the PR!

Regarding simple_5: The question did not specify which kind of roots it is looking for, so the default value "real" should be used. Thus, the possible answer should be ["", "real"]. Same thing with multiple_88.

The rest looks good to me.

Hi, thanks for your comment. W.r.t simple_5, in this case, should we change the prompt so the default is "all" instead? Mathematically, that makes more sense as a default.

@HuanzhiMao HuanzhiMao added the BFCL-Dataset BFCL Dataset-Related Issue label Aug 27, 2024
@HuanzhiMao
Copy link
Collaborator

Hi @aw632, Thanks for the PR!
Regarding simple_5: The question did not specify which kind of roots it is looking for, so the default value "real" should be used. Thus, the possible answer should be ["", "real"]. Same thing with multiple_88.
The rest looks good to me.

Hi, thanks for your comment. W.r.t simple_5, in this case, should we change the prompt so the default is "all" instead? Mathematically, that makes more sense as a default.

Fair point. I will update the prompt to eliminate ambiguity.

Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Total number of entries affected: 21
  • Simple: 16 entry
    • simple_5, simple_13, simple_96, simple_122, simple_156, simple_183, simple_235, simple_238, simple_267, simple_308, simple_309, simple_316, simple_375, simple_379, simple_389, simple_398
  • Multiple: 5 entry
    • multiple_74, multiple_88, multiple_119, multiple_153, multiple_183

Copy link
Owner

@ShishirPatil ShishirPatil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @aw632 and welcome!

@ShishirPatil ShishirPatil merged commit eca516a into ShishirPatil:main Aug 29, 2024
@aw632 aw632 deleted the aw632/fix_test_cases branch August 30, 2024 06:54
ShishirPatil pushed a commit that referenced this pull request Sep 15, 2024
…627, #635, and #638. (#639)

This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #608
2. #600
3. #616 
4. #623
5. #626
6. #627
7. #635 
8. #638

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Dataset BFCL Dataset-Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants