Skip to content

[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Oct 19, 2024

This is an optimization that can sometimes improve the performance of models with low context windows.

According to OpenAI's spec, the max_tokens for the completions endpoint is the maximum number of tokens that can be generated. It does not include the input token count. For example, if a model has a context length of 4096, our input message takes 1000 tokens and you set the max_tokens to 4096, then that would error because the total number of tokens (1000 in input and 4096 requesed for output) exceeds the model's context window length. So what we want to do is that, before call the completions endpoint, use the model's tokenizer to figure out how many tokens the input message formatted_prompt has used, subtract that amount from the model's maximum context length and supply that value for the max_token argument so that we can try to avoid getting the maximum length exceeded error. In short, we want to allow the model to generate as many as possible till the limit of its context length.

PS, even for the multi turn long context category, in a single turn, the maximum token used by the ground truth function call is much much less than 4096 (calculated using tiktoken). So this should be a safe threshold.

max: 1299
min: 24
average: 125.96

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Oct 19, 2024
Copy link
Collaborator

@Fanjia-Yan Fanjia-Yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Repository owner deleted a comment from yutongxie58 Nov 7, 2024
@ShishirPatil ShishirPatil merged commit 4a68e5d into ShishirPatil:main Nov 8, 2024
@HuanzhiMao HuanzhiMao deleted the token_len branch November 9, 2024 10:01
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 11, 2024
…atil#712)

This is an optimization that can sometimes improve the performance of
models with low context windows.

According to OpenAI's spec, the `max_tokens` for the completions
endpoint is the maximum number of tokens that can be generated. It does
not include the input token count. For example, if a model has a context
length of 4096, our input message takes 1000 tokens and you set the
`max_tokens` to 4096, then that would error because the total number of
tokens (1000 in input and 4096 requesed for output) exceeds the model's
context window length. So what we want to do is that, before call the
completions endpoint, use the model's tokenizer to figure out how many
tokens the input message `formatted_prompt` has used, subtract that
amount from the model's maximum context length and supply that value for
the `max_token` argument so that we can try to avoid getting the maximum
length exceeded error. In short, we want to allow the model to generate
as many as possible till the limit of its context length.

PS, even for the multi turn long context category, in a single turn, the
maximum token used by the ground truth function call is much much less
than 4096 (calculated using tiktoken). So this should be a safe
threshold.

> max: 1299
min: 24
average: 125.96
HuanzhiMao added a commit that referenced this pull request Nov 19, 2024
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #719
2. #722
3. #723
4. #728 
5. #732
6. #725
7. #712
8. #733
9. #720 
10. #760 
11. #761 
12. #767
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants