[Feature]: Online Inference on local model with OpenAI Python SDK

### 🚀 The feature, motivation and pitch

OpenAI recently provided a new endpoint batch inference (https://platform.openai.com/docs/guides/batch/overview?lang=curl). It would be nice if it works using the batch format from OpenAI but with a local model.
I created an usage Issue for that before (https://github.com/vllm-project/vllm/issues/8567)

Something like that:

```python
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
)

batch_input_file = client.files.create(
  file=open("batchinput.jsonl", "rb"),
  purpose="batch"
)

client.batches.create(
    input_file_id= batch_input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
      "description": "nightly eval job"
    }
)
```

At the moment there will be an error:
`NotFoundError: Error code: 404 - {'detail': 'Not Found'}`

Advantages for the implementation:

- vllm can be run as a docker container and function only as endpoint
- It is compatible with the OpenAI Python SDK, so easier to use for newbies also the model can be easily switched from the OpenAI server to local models
- Consistent workflow, if you use the docker for Chat

### Alternatives

Internal Implementation:
There was a feature implemented using `python -m vllm.entrypoints.openai_batch` as described here (https://github.com/vllm-project/vllm/issues/4777), but that is not compatible with the OpenAI SDK and also not compatible with the docker setup.

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Online Inference on local model with OpenAI Python SDK #8631

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Online Inference on local model with OpenAI Python SDK #8631

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions