[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

HuanzhiMao · 2024-10-19T07:27:40Z

This is an optimization that can sometimes improve the performance of models with low context windows.

According to OpenAI's spec, the max_tokens for the completions endpoint is the maximum number of tokens that can be generated. It does not include the input token count. For example, if a model has a context length of 4096, our input message takes 1000 tokens and you set the max_tokens to 4096, then that would error because the total number of tokens (1000 in input and 4096 requesed for output) exceeds the model's context window length. So what we want to do is that, before call the completions endpoint, use the model's tokenizer to figure out how many tokens the input message formatted_prompt has used, subtract that amount from the model's maximum context length and supply that value for the max_token argument so that we can try to avoid getting the maximum length exceeded error. In short, we want to allow the model to generate as many as possible till the limit of its context length.

PS, even for the multi turn long context category, in a single turn, the maximum token used by the ground truth function call is much much less than 4096 (calculated using tiktoken). So this should be a safe threshold.

max: 1299
min: 24
average: 125.96

Fanjia-Yan

LGTM

CharlieJCJ

LGTM

…atil#712) This is an optimization that can sometimes improve the performance of models with low context windows. According to OpenAI's spec, the `max_tokens` for the completions endpoint is the maximum number of tokens that can be generated. It does not include the input token count. For example, if a model has a context length of 4096, our input message takes 1000 tokens and you set the `max_tokens` to 4096, then that would error because the total number of tokens (1000 in input and 4096 requesed for output) exceeds the model's context window length. So what we want to do is that, before call the completions endpoint, use the model's tokenizer to figure out how many tokens the input message `formatted_prompt` has used, subtract that amount from the model's maximum context length and supply that value for the `max_token` argument so that we can try to avoid getting the maximum length exceeded error. In short, we want to allow the model to generate as many as possible till the limit of its context length. PS, even for the multi turn long context category, in a single turn, the maximum token used by the ground truth function call is much much less than 4096 (calculated using tiktoken). So this should be a safe threshold. > max: 1299 min: 24 average: 125.96

This PR updates the leaderboard to reflect the change in score due to the following PR merge: 1. #719 2. #722 3. #723 4. #728 5. #732 6. #725 7. #712 8. #733 9. #720 10. #760 11. #761 12. #767

dynamic max token count

ed19833

HuanzhiMao added the BFCL-General General BFCL Issue label Oct 19, 2024

HuanzhiMao added 3 commits October 21, 2024 06:52

fix typo

8fbfa2c

Merge remote-tracking branch 'upstream/main' into token_len

8ed66e2

Merge branch 'main' into token_len

3a6fe0f

Fanjia-Yan approved these changes Nov 2, 2024

View reviewed changes

Merge branch 'main' into token_len

cab51f3

CharlieJCJ approved these changes Nov 5, 2024

View reviewed changes

add back stop token ids

b547836

HuanzhiMao mentioned this pull request Nov 7, 2024

[BFCL] I tested some oss model on the AST dataset offline using a local environment, and the scores differ significantly from the leaderboard. What points might I have overlooked? Here are the results of the experiment. #741

Closed

Repository owner deleted a comment from yutongxie58 Nov 7, 2024

ShishirPatil merged commit 4a68e5d into ShishirPatil:main Nov 8, 2024

HuanzhiMao deleted the token_len branch November 9, 2024 10:01

HuanzhiMao mentioned this pull request Nov 9, 2024

[BFCL] Leaderboard Update, 11/17/2024 #748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

Uh oh!

HuanzhiMao commented Oct 19, 2024 •

edited

Loading

Uh oh!

Fanjia-Yan left a comment

Uh oh!

CharlieJCJ left a comment

Uh oh!

Uh oh!

[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

[BFCL] Support Dynamic max_tokens for Locally-Hosted Models #712

Uh oh!

Conversation

HuanzhiMao commented Oct 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fanjia-Yan left a comment

Choose a reason for hiding this comment

Uh oh!

CharlieJCJ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuanzhiMao commented Oct 19, 2024 •

edited

Loading