Skip to content

Commit 79ad521

Browse files
Pernekhansumitd2
authored andcommitted
[Bugfix] Include encoder prompts len to non-stream api usage response (vllm-project#8861)
Signed-off-by: Sumit Dubey <[email protected]>
1 parent 8abaf98 commit 79ad521

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm/entrypoints/openai/serving_chat.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -726,6 +726,8 @@ async def chat_completion_full_generator(
726726

727727
assert final_res.prompt_token_ids is not None
728728
num_prompt_tokens = len(final_res.prompt_token_ids)
729+
if final_res.encoder_prompt_token_ids is not None:
730+
num_prompt_tokens += len(final_res.encoder_prompt_token_ids)
729731
num_generated_tokens = sum(
730732
len(output.token_ids) for output in final_res.outputs)
731733
usage = UsageInfo(

0 commit comments

Comments
 (0)