Skip to content

Commit d7c791d

Browse files
XuHwangHuanzhiMao
authored andcommitted
[BFCL] Add ToolACE handler for BFCL-v3 (ShishirPatil#653)
This PR adds the handler of the [ToolACE](https://huggingface.co/Team-ACE/ToolACE-8B) model, which finetunes LLaMA-3.1-8B-Instruct model with [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) dataset, obtaining wonderful points in functional calling. We have adapted our handler compatible with version 3. Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines. | **Rank** | **Overall Acc** | **Non-Live AST Acc** | **Non-Live Simple AST** | **Non-Live Multiple AST** | **Non-Live Parallel AST** | **Non-Live Parallel Multiple AST** | **Non-Live Exec Acc** | **Non-Live Simple Exec** | **Non-Live Multiple Exec** | **Non-Live Parallel Exec** | **Non-Live Parallel Multiple Exec** | **Live Acc** | **Live Simple AST** | **Live Multiple AST** | **Live Parallel AST** | **Live Parallel Multiple AST** | **Multi Turn Acc** | **Multi Turn Base** | **Multi Turn Miss Func** | **Multi Turn Miss Param** | **Multi Turn Long Context** | **Multi Turn Composite** | **Relevance Detection** | **Irrelevance Detection** | |----------|-----------------|----------------------|-------------------------|---------------------------|---------------------------|------------------------------------|-----------------------|--------------------------|----------------------------|----------------------------|-------------------------------------|--------------|---------------------|-----------------------|-----------------------|--------------------------------|--------------------|---------------------|--------------------------|---------------------------|-----------------------------|--------------------------|-------------------------|---------------------------| | 1 | 59.22% | 89.27% | 80.58% | 95.00% | 91.00% | 90.50% | 90.07% | 98.29% | 94.00% | 88.00% | 80.00% | 73.21% | 62.79% | 74.25% | 81.25% | 75.00% | 14.37% | 21.50% | 6.50% | 17.50% | 12.00% | N/A | 85.37% | 83.81% | Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot~ --------- Co-authored-by: Huanzhi (Hans) Mao <[email protected]>
1 parent bd14f1c commit d7c791d

File tree

4 files changed

+10
-0
lines changed

4 files changed

+10
-0
lines changed

berkeley-function-call-leaderboard/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
All notable changes to the Berkeley Function Calling Leaderboard will be documented in this file.
44

5+
- [Oct 4, 2024] [#653](https://github.com/ShishirPatil/gorilla/pull/653): Add new model `Team-ACE/ToolACE-8B` to the leaderboard.
56
- [Oct 4, 2024] [#671](https://github.com/ShishirPatil/gorilla/pull/671): Speed up locally-hosted model's inference process by parallelizing the inference requests.
67
- [Sept 27, 2024] [#640](https://github.com/ShishirPatil/gorilla/pull/640): Add the following new models to the leaderboard:
78
- `microsoft/Phi-3.5-mini-instruct`

berkeley-function-call-leaderboard/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,7 @@ Below is _a table of models we support_ to run our leaderboard evaluation agains
186186
|ibm-granite/granite-20b-functioncalling 💻| Function Calling|
187187
|yi-large-fc | Function Calling|
188188
|MadeAgents/Hammer-7b 💻| Function Calling|
189+
|Team-ACE/ToolACE-8B 💻| Function Calling|
189190

190191
Here {MODEL} 💻 means the model needs to be hosted locally and called by vllm, {MODEL} means the models that are called API calls. For models with a trailing `-FC`, it means that the model supports function-calling feature. You can check out the table summarizing feature supports among different models [here](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#prompt).
191192

berkeley-function-call-leaderboard/bfcl/eval_checker/model_metadata.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -581,6 +581,12 @@
581581
"Microsoft",
582582
"MIT",
583583
],
584+
"Team-ACE/ToolACE-8B": [
585+
"ToolACE-8B (FC)",
586+
"https://huggingface.co/Team-ACE/ToolACE-8B",
587+
"Huawei Noah & USTC",
588+
"Apache-2.0",
589+
],
584590
}
585591

586592
INPUT_PRICE_PER_MILLION_TOKEN = {
@@ -733,4 +739,5 @@
733739
"Salesforce/xLAM-7b-r",
734740
"Salesforce/xLAM-8x7b-r",
735741
"Salesforce/xLAM-8x22b-r",
742+
"Team-ACE/ToolACE-8B",
736743
]

berkeley-function-call-leaderboard/bfcl/model_handler/handler_map.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@
108108
"ibm-granite/granite-20b-functioncalling": GraniteHandler,
109109
# "MadeAgents/Hammer-7b": HammerHandler, # TODO: Update handler once they have a multi-turn format
110110
"THUDM/glm-4-9b-chat": GLMHandler,
111+
"Team-ACE/ToolACE-8B": LlamaHandler,
111112

112113
# Deprecated/outdated models, no longer on the leaderboard
113114
# "gorilla-openfunctions-v0": GorillaHandler,

0 commit comments

Comments
 (0)