Skip to content

Commit d9bb9be

Browse files
mht-sharmaNarsil
andauthored
Add llama4 (#3145)
* initial changes * Add support for other vlm * cleanup comment * Improve attn_implementation * Add comments for support of models * add model * add model * fixes and improvements * update docker * Add cache position * Add tests * remove redundant changes * remove tr version * Upgrade doc + fix linting. * Fixing the CI. --------- Co-authored-by: Nicolas Patry <[email protected]>
1 parent 3d059f9 commit d9bb9be

19 files changed

+1893
-61
lines changed

Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-ins
6565
COPY --from=ghcr.io/astral-sh/uv:0.5.31 /uv /uvx /bin/
6666
ENV PATH="$PATH:/root/.local/bin"
6767
RUN uv python install ${PYTHON_VERSION}
68-
RUN uv venv --python ${PYTHON_VERSION} && uv pip install torch==${PYTORCH_VERSION} pip setuptools packaging
68+
RUN uv venv --python ${PYTHON_VERSION} && uv pip install torch==${PYTORCH_VERSION} torchvision pip setuptools packaging
6969
ENV VIRTUAL_ENV=/usr/src/.venv/
7070
ENV PATH="$PATH:/usr/src/.venv/bin/"
7171

@@ -193,6 +193,9 @@ RUN cd server && \
193193
pwd && \
194194
text-generation-server --help
195195

196+
# This shouldn't be necessary.
197+
# RUN uv pip install torchvision --no-deps
198+
196199
# Copy build artifacts from flash attention builder
197200
COPY --from=flash-att-builder /usr/src/flash-attention/build/lib.linux-x86_64-cpython-311 /usr/src/.venv/lib/python3.11/site-packages
198201
COPY --from=flash-att-builder /usr/src/flash-attention/csrc/layer_norm/build/lib.linux-x86_64-cpython-311 /usr/src/.venv/lib/python3.11/site-packages

docs/source/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Text Generation Inference enables serving optimized models. The following sectio
99
- [Idefics 3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (Multimodal)
1010
- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)
1111
- [Llama](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
12+
- [Llama4](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
1213
- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
1314
- [Granite](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
1415
- [Gemma](https://huggingface.co/google/gemma-7b)

0 commit comments

Comments
 (0)