-
Notifications
You must be signed in to change notification settings - Fork 141
Description
/kind bug
What steps did you take and what happened:
In the kserve documentation, one example for authenticating with huggingface, is to add the secret to the InferenceService (full manifest below, it's identical to the documentation)
env:
- name: HF_TOKEN # Option 2 for authenticating with HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: HF_TOKEN
optional: false
this passes the HF_TOKEN to the kserve-container
, but not the storage-initializer
, leading to the following error in the storage-initializer
container
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must have access to it and be authenticated to access it. Please log in.
(note that I have created the secret, and it is used by the kserve-container
as expected)
I can resolve by adding
spec:
container:
env:
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: HF_TOKEN
name: hf-secret
optional: true
to the ClusterStorageContainer
, in which case the deployment works, but this isn't mentioned in the documentation (and isn't configurable for end users)
What's the InferenceService yaml:
Copied from the kserve documentation
kind: InferenceService
metadata:
name: huggingface-llama3
spec:
predictor:
model:
modelFormat:
name: huggingface
args:
- --model_name=llama3
- --model_dir=/mnt/models
storageUri: hf://meta-llama/meta-llama-3-8b-instruct
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
requests:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
env:
- name: HF_TOKEN # Option 2 for authenticating with HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: HF_TOKEN
optional: false
Environment:
- KServe Version:
0.15.0
(originally on0.14.1
, but tested after upgrading)
Possible I'm missing something obvious here, apologies if so!
(btw, the meta-llama/Meta-Llama-3-8B-Instruct
model used in the documentation also seems to require the user to manaully request access in huggingface, it may be worth putting a note about this in the docs)