vLLM adapter for a TGIS-compatible grpc server.
vllm-tgis-adapter is available on PyPi
pip install vllm-tgis-adapter
python -m vllm_tgis_adapterInstalling the adapter also install a grpc healthcheck cli that can be used to monitor the status of the grpc server:
$ grpc_healtheck
health check...status: SERVINGSee usage with
grpc_healthcheck --helppython -m build
pip install dist/*whl
python -m vllm_tgis_adapterThis will start serving a grpc server on port 8033. This can be queried with grpcurl:
bash examples/inference.shImage available at quay.io/opendatahub/vllm, built from opendatahub-io/vllm's Dockerfile.ubi
docker pull quay.io/opendatahub/vllmSee examples
Set up pre-commit for linting/style/misc fixes:
pip install pre-commit
pre-commit install
# to run on all files
pre-commit run --all-filesThis project uses nox to manage test automation and uv for venv management:
pip install nox uv
nox --list # list available sessions
nox -s tests-3.10 # run tests session for a specific python version
nox -s build-3.11 # build the wheel package
nox -s lint-3.11 -- --mypy # run linting with type checksThe standard vllm built requires an Nvidia GPU. When this is not available, it is possible to compile vllm from source with CPU support:
git clone https://github.com/vllm-project/vllm
cd vllm
uv venv
source .venv/bin/activate
export UV_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu \
UV_INDEX_STRATEGY=unsafe-best-match\
.github/scripts/install_vllm_build_deps.py pyproject.toml
env \
VLLM_TARGET_DEVICE=cpu \
python setup.py bdist_wheel
export VLLM_VERSION_OVERRIDE=$PWD/dist/*whl
# the nox session can now be run with the custom built vllm cpu versionmaking it possible to run the tests on most hardware. Please note that the uv extra index url is required in order to install the torch CPU version.