[BFCL] Add ToolACE handler for BFCL-v3 (ShishirPatil#653)

XuHwang · HuanzhiMao · VishnuSuresh27 · commit d7c791dbc812 · 2024-11-10T23:12:44.000-08:00
This PR adds the handler of the [ToolACE](https://huggingface.co/Team-ACE/ToolACE-8B) model, which finetunes LLaMA-3.1-8B-Instruct model with [ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) dataset, obtaining wonderful points in functional calling. We have adapted our handler compatible with version 3. Here are the results of the version evaluated in our machine (4*v100-32GB). We also found that the results would change in different machines. | **Rank** | **Overall Acc** | **Non-Live AST Acc** | **Non-Live Simple AST** | **Non-Live Multiple AST** | **Non-Live Parallel AST** | **Non-Live Parallel Multiple AST** | **Non-Live Exec Acc** | **Non-Live Simple Exec** | **Non-Live Multiple Exec** | **Non-Live Parallel Exec** | **Non-Live Parallel Multiple Exec** | **Live Acc** | **Live Simple AST** | **Live Multiple AST** | **Live Parallel AST** | **Live Parallel Multiple AST** | **Multi Turn Acc** | **Multi Turn Base** | **Multi Turn Miss Func** | **Multi Turn Miss Param** | **Multi Turn Long Context** | **Multi Turn Composite** | **Relevance Detection** | **Irrelevance Detection** | |----------|-----------------|----------------------|-------------------------|---------------------------|---------------------------|------------------------------------|-----------------------|--------------------------|----------------------------|----------------------------|-------------------------------------|--------------|---------------------|-----------------------|-----------------------|--------------------------------|--------------------|---------------------|--------------------------|---------------------------|-----------------------------|--------------------------|-------------------------|---------------------------| | 1 | 59.22% | 89.27% | 80.58% | 95.00% | 91.00% | 90.50% | 90.07% | 98.29% | 94.00% | 88.00% | 80.00% | 73.21% | 62.79% | 74.25% | 81.25% | 75.00% | 14.37% | 21.50% | 6.50% | 17.50% | 12.00% | N/A | 85.37% | 83.81% | Thanks for your efforts in holding such a wonderful leaderboard. We need your help (@HuanzhiMao, @CharlieJCJ ) in adding our model to the leaderboard. Thanks a lot～ --------- Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
diff --git a/berkeley-function-call-leaderboard/CHANGELOG.md b/berkeley-function-call-leaderboard/CHANGELOG.md
@@ -2,6 +2,7 @@
 
 All notable changes to the Berkeley Function Calling Leaderboard will be documented in this file.
 
+- [Oct 4, 2024] [#653](https://github.com/ShishirPatil/gorilla/pull/653): Add new model `Team-ACE/ToolACE-8B` to the leaderboard.
 - [Oct 4, 2024] [#671](https://github.com/ShishirPatil/gorilla/pull/671): Speed up locally-hosted model's inference process by parallelizing the inference requests.
 - [Sept 27, 2024] [#640](https://github.com/ShishirPatil/gorilla/pull/640): Add the following new models to the leaderboard:
   - `microsoft/Phi-3.5-mini-instruct`
diff --git a/berkeley-function-call-leaderboard/README.md b/berkeley-function-call-leaderboard/README.md
@@ -186,6 +186,7 @@ Below is _a table of models we support_ to run our leaderboard evaluation agains
 |ibm-granite/granite-20b-functioncalling 💻| Function Calling|
 |yi-large-fc | Function Calling|
 |MadeAgents/Hammer-7b 💻| Function Calling|
+|Team-ACE/ToolACE-8B 💻| Function Calling|
 
 Here {MODEL} 💻 means the model needs to be hosted locally and called by vllm, {MODEL} means the models that are called API calls. For models with a trailing `-FC`, it means that the model supports function-calling feature. You can check out the table summarizing feature supports among different models [here](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#prompt).
 
diff --git a/berkeley-function-call-leaderboard/bfcl/eval_checker/model_metadata.py b/berkeley-function-call-leaderboard/bfcl/eval_checker/model_metadata.py
@@ -581,6 +581,12 @@
         "Microsoft",
         "MIT",
     ],
+    "Team-ACE/ToolACE-8B": [
+        "ToolACE-8B (FC)",
+        "https://huggingface.co/Team-ACE/ToolACE-8B",
+        "Huawei Noah & USTC",
+        "Apache-2.0",
+    ],
 }
 
 INPUT_PRICE_PER_MILLION_TOKEN = {
@@ -733,4 +739,5 @@
     "Salesforce/xLAM-7b-r",
     "Salesforce/xLAM-8x7b-r",
     "Salesforce/xLAM-8x22b-r",
+    "Team-ACE/ToolACE-8B",
 ]
diff --git a/berkeley-function-call-leaderboard/bfcl/model_handler/handler_map.py b/berkeley-function-call-leaderboard/bfcl/model_handler/handler_map.py
@@ -108,6 +108,7 @@
     "ibm-granite/granite-20b-functioncalling": GraniteHandler,
     # "MadeAgents/Hammer-7b": HammerHandler,  # TODO: Update handler once they have a multi-turn format
     "THUDM/glm-4-9b-chat": GLMHandler,
+    "Team-ACE/ToolACE-8B": LlamaHandler,
     
     # Deprecated/outdated models, no longer on the leaderboard
     # "gorilla-openfunctions-v0": GorillaHandler,