Skip to content

Add CALM models #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 6, 2025
Merged

Add CALM models #900

merged 3 commits into from
Feb 6, 2025

Conversation

jgreer013
Copy link
Contributor

@jgreer013 jgreer013 commented Feb 5, 2025

Add the following new models to the leaderboard:

  • uiuc-convai/CALM-8B
  • uiuc-convai/CALM-70B
  • uiuc-convai/CALM-405B

@HuanzhiMao HuanzhiMao added the BFCL-New Model Add New Model to BFCL label Feb 6, 2025
Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and welcome @jgreer013 !

I wanted to mention that due to our current compute constraints (we only have 8 H100 GPUs available), we’re unable to benchmark any 405B models at the moment—unless they’re hosted externally and accessible via API calls.

@HuanzhiMao HuanzhiMao merged commit 3d268da into ShishirPatil:main Feb 6, 2025
@jgreer013
Copy link
Contributor Author

@HuanzhiMao Assuming we can get it hosted, what would the process be to give you access for submission?

Alternatively, would an I8/FP8 405B work? That should fit within the 640GB of VRAM.

@jgreer013
Copy link
Contributor Author

Alternatively, we could share the 405B generate outputs directly and you all can validate the score.

@HuanzhiMao
Copy link
Collaborator

Replied over email. Copied over here for the record.

Hosting the model would indeed be ideal. Functionary has done this before by spinning up an OpenAI-compatible server and sharing the endpoint via DM on Discord. Once I have that endpoint, I can simply update the base_url in our handler to direct inference there. For reference, it would look something like this:

base_url="https://xxxx-8000.proxy.runpod.net/v1"
model="meetkai/functionary-medium-v3.1"

Regarding the I8/FP8 405B option, that should work as well—though we would note this on the leaderboard as uiuc-convai/CALM-405B (FP8) to reflect the difference in precision.

As for sharing pre-generated outputs, we unfortunately can’t accept them because we need to ensure fairness and consistency in our evaluations. There’s no way for us to verify that the outputs haven’t been altered.

HuanzhiMao added a commit that referenced this pull request Feb 11, 2025
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #888 
2. #887 
3. #895 
4. #894 
5. #897 
6. #892 
7. #898 
8. #900 
9. #902 

Models were evaluated using checkpoint commit 910460e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-New Model Add New Model to BFCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants