Skip to content

[BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py #506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 19, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

There are inconsistencies between the test_category argument that's used by eval_checker/eval_runner.py and openfunctions_evaluation.py.

This PR partially addresses #501 and #502.

@HuanzhiMao HuanzhiMao marked this pull request as ready for review July 7, 2024 00:07
@HuanzhiMao HuanzhiMao changed the title [BFCL] Standarize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py Jul 7, 2024
Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

@HuanzhiMao
Copy link
Collaborator Author

LGTM

Comment: improve warning message in the future, make them more information for user to know immediate action items when aggregating results for data.csv.

Good point. Will address this in a different PR.

@ShishirPatil
Copy link
Owner

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

@HuanzhiMao
Copy link
Collaborator Author

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

@devanshamin
Copy link
Contributor

I have solved this issue here.

@ShishirPatil
Copy link
Owner

Hey @HuanzhiMao and @CharlieJCJ I don't think this is a great idea for the following reason. So we start by showing how to install the dependencies, and then in the middle we go into a long extempore on the different flags and then we come back with a list of commands to run. This ain't helpful back and forth? Maybe we retain the previous structure where after each command we show the options so users have the context on the command. Thoughts?

How about we move the flags section to the bottom of the README, just like how all the available model names are displayed at the end?

I don't think this solves it right? We should present the flags when the user cares about the command, not super down. Folks read and execute code from READMEs linearly, and making them jump around isn't a good idea imo.

@HuanzhiMao HuanzhiMao requested a review from ShishirPatil July 18, 2024 22:52
@ShishirPatil ShishirPatil merged commit a9dd435 into ShishirPatil:main Jul 19, 2024
@HuanzhiMao HuanzhiMao deleted the test-category branch July 21, 2024 05:01
ShishirPatil pushed a commit that referenced this pull request Aug 10, 2024
As mentioned in #506, this PR make the warning messages more informative
for user to know action items when aggregating leaderboard results.

---------

Co-authored-by: CharlieJCJ <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants