Skip to content

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Apr 4, 2024

This PR is for the leaderboard April 3 release:

  1. Bug fix for evaluation dataset possible answers, including those that are identified in Data issues identified in Gorilla leaderboard test dataset during data sanity checks #301.
  2. Implement string standardization for the AST evaluation pipeline, i.e. removing white spaces and a subset of punctuations (,./-_*^) to make the AST evaluation more robust and accurate.
  3. Fix AST evaluation issue for type tuple.
  4. Fix AST evaluation issue for Java and JavaScript.
  5. Add 2 new models meetkai/functionary-small-v2.4 (FC), meetkai/functionary-medium-v2.4 (FC) to the leaderboard.

This PR DOES change the leaderboard score. We will update the leaderboard website shortly, in a different PR.


Co-authored-by: Charlie Cheng-Jie Ji [email protected]
Co-authored-by: Fanjia Yan [email protected]

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShishirPatil ShishirPatil merged commit 82f8fc5 into ShishirPatil:main Apr 5, 2024
ShishirPatil pushed a commit that referenced this pull request Apr 5, 2024
This PR updates the leaderboard data, as mentioned in #309 

This PR **DOES** change the leaderboard value.
ShishirPatil added a commit that referenced this pull request Apr 6, 2024
This PR updates the evaluation metric in our leaderboard blog to be in
sync with #309, as the AST evaluation pipeline has been updated.

This PR **does not** change the leaderboard value.

---------

Co-authored-by: Shishir Patil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants