-
Notifications
You must be signed in to change notification settings - Fork 296
Open
Description
System Info
On some inputs (example attached) for Qwen/Qwen3-Embedding-0.6B, the tei server embedding service stops responding.
The HTTP request with the borked input does not get a response and other requests will not be served.
In wireshark I can see that the request is sent, but there is nothing coming back from the server.
There is 0 CPU or GPU activity.
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
On both cuda and ipex cpu.
# docker run --rm --gpus all -p 8000:80 -v /opt/tei_data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest --model-id Qwen/Qwen3-Embedding-0.6B
cpu-ipex-latest: Pulling from huggingface/text-embeddings-inference
Digest: sha256:f1c8b121b9b0582b7ae3e4180f2eebcf968aa8c8de3a1965969b61870c6a48a5
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest
and
# docker run --rm --gpus all -p 8000:80 -v /opt/tei_data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.8 --model-id Qwen/Qwen3-Embedding-0.6B
1.8: Pulling from huggingface/text-embeddings-inference
Digest: sha256:8aeb97215f29e0ed48647384af89661c36cee04120c2d4e86b5a3aead47611fa
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:1.8
curl -vvvv -H "Content-Type: application/json" -d @tei_qwen_embed_0.6b_broken_input.txt http://inference:8000/embed
tei_qwen_embed_0.6b_broken_input.txt
Expected behavior
The HTTP request should not hang and it also should not block other embed requests.
Metadata
Metadata
Assignees
Labels
No labels