Skip to content

Some inputs hang the whole embedding service on Qwen3-Embedding-0.6B #694

@lambdaupb

Description

@lambdaupb

System Info

On some inputs (example attached) for Qwen/Qwen3-Embedding-0.6B, the tei server embedding service stops responding.
The HTTP request with the borked input does not get a response and other requests will not be served.

In wireshark I can see that the request is sent, but there is nothing coming back from the server.

There is 0 CPU or GPU activity.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

On both cuda and ipex cpu.

# docker run --rm --gpus all -p 8000:80 -v /opt/tei_data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest --model-id Qwen/Qwen3-Embedding-0.6B 
cpu-ipex-latest: Pulling from huggingface/text-embeddings-inference
Digest: sha256:f1c8b121b9b0582b7ae3e4180f2eebcf968aa8c8de3a1965969b61870c6a48a5
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:cpu-ipex-latest

and

# docker run --rm --gpus all -p 8000:80 -v /opt/tei_data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.8 --model-id Qwen/Qwen3-Embedding-0.6B
1.8: Pulling from huggingface/text-embeddings-inference
Digest: sha256:8aeb97215f29e0ed48647384af89661c36cee04120c2d4e86b5a3aead47611fa
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:1.8

curl -vvvv -H "Content-Type: application/json" -d @tei_qwen_embed_0.6b_broken_input.txt http://inference:8000/embed

tei_qwen_embed_0.6b_broken_input.txt

Expected behavior

The HTTP request should not hang and it also should not block other embed requests.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions