Skip to content

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Nov 28, 2024

This PR updates the decoding logic for DeepSeek-Coder handler (introduced in #697) to fix its performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the decode_ast should fail (error) or the decoded output is empty (eg, empty list or empty string).

For the DeepSeek-Coder model,
When it outputs a valid function call, the model response will be a list of dictionaries [{func1:{param1:val1,...}},{func2:{param2:val2,...}}], so it's fine for decode_ast to just return it without any processing.
However, when the output is a message (not valid function call), under the _parse_query_response_prompting logic, the model response will be that message string, and in the current decode_ast implementation, that string will just be treated as the decoded output, and it would fail both the metric for the irrelevance category, which is not ideal.

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Nov 28, 2024
@HuanzhiMao HuanzhiMao marked this pull request as ready for review November 28, 2024 00:39
@HuanzhiMao HuanzhiMao merged commit 7cec275 into ShishirPatil:main Nov 28, 2024
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 28, 2024
ShishirPatil#796)

This PR updates the decoding logic for DeepSeek-Coder handler to fix its
performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the `decode_ast`
should fail (error) or the decoded output is empty (eg, empty list or
empty string).

For the DeepSeek-Coder model, 
When it outputs a valid function call, the model response will be a list
of dictionaries `[{func1:{param1:val1,...}},{func2:{param2:val2,...}}]`,
so it's fine for `decode_ast` to just return it without any processing.
However, when the output is a message (not valid function call), under
the `_parse_query_response_prompting` logic, the model response will be
that message string, and in the current `decode_ast` implementation,
that string will just be treated as the decoded output, and it would
fail both the metric for the irrelevance category, which is not ideal.
HuanzhiMao added a commit that referenced this pull request Dec 7, 2024
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #747 
2. #770 
3. #768 
4. #750 
5. #763 
6. #772 
7. #777 
8. #778 
9. #786 
10. #787 
11. #697 
12. #718 
13. #755 
14. #796 
15. #789 
16. #804 
17. #808 
18. #809
19. #811 
20. #810 

Models were evaluated using checkpoint commit d7e52e5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants