Add CALM models #900

jgreer013 · 2025-02-05T23:44:09Z

Add the following new models to the leaderboard:

uiuc-convai/CALM-8B
uiuc-convai/CALM-70B
uiuc-convai/CALM-405B

HuanzhiMao

Thanks for the PR and welcome @jgreer013 !

I wanted to mention that due to our current compute constraints (we only have 8 H100 GPUs available), we’re unable to benchmark any 405B models at the moment—unless they’re hosted externally and accessible via API calls.

jgreer013 · 2025-02-06T15:21:29Z

@HuanzhiMao Assuming we can get it hosted, what would the process be to give you access for submission?

Alternatively, would an I8/FP8 405B work? That should fit within the 640GB of VRAM.

jgreer013 · 2025-02-06T17:49:50Z

Alternatively, we could share the 405B generate outputs directly and you all can validate the score.

HuanzhiMao · 2025-02-06T22:49:02Z

Replied over email. Copied over here for the record.

Hosting the model would indeed be ideal. Functionary has done this before by spinning up an OpenAI-compatible server and sharing the endpoint via DM on Discord. Once I have that endpoint, I can simply update the base_url in our handler to direct inference there. For reference, it would look something like this:

base_url="https://xxxx-8000.proxy.runpod.net/v1"
model="meetkai/functionary-medium-v3.1"

Regarding the I8/FP8 405B option, that should work as well—though we would note this on the leaderboard as uiuc-convai/CALM-405B (FP8) to reflect the difference in precision.

As for sharing pre-generated outputs, we unfortunately can’t accept them because we need to ensure fairness and consistency in our evaluations. There’s no way for us to verify that the outputs haven’t been altered.

This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #888 2. #887 3. #895 4. #894 5. #897 6. #892 7. #898 8. #900 9. #902 Models were evaluated using checkpoint commit 910460e.

jgreer013 added 2 commits February 5, 2025 15:43

Update SUPPORTED_MODELS.md

643751a

Update remaining files

d77943e

HuanzhiMao added the BFCL-New Model Add New Model to BFCL label Feb 6, 2025

HuanzhiMao approved these changes Feb 6, 2025

View reviewed changes

HuanzhiMao mentioned this pull request Feb 6, 2025

[BFCL] Leaderboard Update - 2025/02/09 (Checkpoint 910460e) #899

Merged

Merge branch 'main' into patch-1

a6b5123

HuanzhiMao merged commit 3d268da into ShishirPatil:main Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CALM models #900

Add CALM models #900

Uh oh!

jgreer013 commented Feb 5, 2025 •

edited by HuanzhiMao

Loading

Uh oh!

HuanzhiMao left a comment

Uh oh!

jgreer013 commented Feb 6, 2025

Uh oh!

jgreer013 commented Feb 6, 2025

Uh oh!

HuanzhiMao commented Feb 6, 2025

Uh oh!

Uh oh!

Add CALM models #900

Add CALM models #900

Uh oh!

Conversation

jgreer013 commented Feb 5, 2025 • edited by HuanzhiMao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuanzhiMao left a comment

Choose a reason for hiding this comment

Uh oh!

jgreer013 commented Feb 6, 2025

Uh oh!

jgreer013 commented Feb 6, 2025

Uh oh!

HuanzhiMao commented Feb 6, 2025

Uh oh!

Uh oh!

jgreer013 commented Feb 5, 2025 •

edited by HuanzhiMao

Loading