-
-
Notifications
You must be signed in to change notification settings - Fork 9.2k
Add DeepSpeed MII backend to benchmark script #1649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left some small comments.
default="vllm") | ||
parser.add_argument("--dataset", | ||
type=str, | ||
required=True, | ||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this line changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it's because users can set the fixed input and output lengths instead of providing a dataset.
args.tokenizer, trust_remote_code=args.trust_remote_code) | ||
if args.dataset is None: | ||
# Synthesize a prompt with the given input length. | ||
prompt = "hi" * (args.input_len - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be " ".join(["hi"] * args.input_len)
? In general, how can you make sure the prompt you generate has the number of tokens you specified with a bunch of "hi"s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it's a bit hacky. However, I found this worked for LLaMA and OPT because "hi" is a single token in their tokenizers and "hi" * n is split into n "hi" tokens.
Hi @WoosukKwon I am trying to repro your result and I get to the following error when running wiht vllm backend:
Do you have any idea how I can resolve this? |
This PR adds the DeepSpeed-MII backend to
benchmark_throughput.py
. The script uses MII's non-persistent pipeline API.