Skip to content

Conversation

cranechu0131
Copy link
Contributor

feat: change oneccl

output_str = tokenizer.decode(output[0], skip_special_tokens=True)
avg_time = (end - st) / actual_output_len * 1000
print(f'Inference time of generating {actual_output_len} tokens: {end-st} s, average token latency is {avg_time} ms/token.')
print(f'Inference time of generating {actual_output_len} tokens: {end-st} s,first token cost {model.first_cost} s, rest tokens average cost {model.rest_cost_mean} s')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before ,

@hkvision hkvision requested a review from plusbang October 30, 2024 02:36
Copy link
Contributor

@plusbang plusbang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hkvision hkvision merged commit 29400e2 into intel:main Oct 31, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants