[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

HuanzhiMao · 2024-11-28T00:38:06Z

This PR updates the decoding logic for DeepSeek-Coder handler (introduced in #697) to fix its performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the decode_ast should fail (error) or the decoded output is empty (eg, empty list or empty string).

For the DeepSeek-Coder model,
When it outputs a valid function call, the model response will be a list of dictionaries [{func1:{param1:val1,...}},{func2:{param2:val2,...}}], so it's fine for decode_ast to just return it without any processing.
However, when the output is a message (not valid function call), under the _parse_query_response_prompting logic, the model response will be that message string, and in the current decode_ast implementation, that string will just be treated as the decoded output, and it would fail both the metric for the irrelevance category, which is not ideal.

ShishirPatil#796) This PR updates the decoding logic for DeepSeek-Coder handler to fix its performance issue in the irrelevance category. The irrelevance category metric we use is that, either the `decode_ast` should fail (error) or the decoded output is empty (eg, empty list or empty string). For the DeepSeek-Coder model, When it outputs a valid function call, the model response will be a list of dictionaries `[{func1:{param1:val1,...}},{func2:{param2:val2,...}}]`, so it's fine for `decode_ast` to just return it without any processing. However, when the output is a message (not valid function call), under the `_parse_query_response_prompting` logic, the model response will be that message string, and in the current `decode_ast` implementation, that string will just be treated as the decoded output, and it would fail both the metric for the irrelevance category, which is not ideal.

This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #747 2. #770 3. #768 4. #750 5. #763 6. #772 7. #777 8. #778 9. #786 10. #787 11. #697 12. #718 13. #755 14. #796 15. #789 16. #804 17. #808 18. #809 19. #811 20. #810 Models were evaluated using checkpoint commit d7e52e5.

HuanzhiMao added 2 commits November 27, 2024 16:14

improve deepseek coder decoding logic

90d3f03

improve handle_multiple_input

280a50a

HuanzhiMao added the BFCL-General General BFCL Issue label Nov 28, 2024

HuanzhiMao marked this pull request as ready for review November 28, 2024 00:39

CharlieJCJ approved these changes Nov 28, 2024

View reviewed changes

HuanzhiMao merged commit 7cec275 into ShishirPatil:main Nov 28, 2024

HuanzhiMao mentioned this pull request Nov 28, 2024

[BFCL] Leaderboard Update - 2024/12/06 (Checkpoint d7e52e5) #800

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

Uh oh!

HuanzhiMao commented Nov 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

Uh oh!

Conversation

HuanzhiMao commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HuanzhiMao commented Nov 28, 2024 •

edited

Loading