-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Describe the issue
I'm trying to run inference on a Deformable-DETR model trained using HF Transformers and converted to onnx. Running inference using onnxruntime-gpu in Python works like a charm, with the expected performance (0.05s). Running the same model in onnxruntime-web using the WebGPU provider takes a bit less than using WASM, about 12 seconds (after warmup).
I tried to follow the docs to collect some additional information and I get quite a few instances like these:
ort-wasm-simd-threaded.jsep.wasm:0x1039a7f 2024-10-13 21:42:56.842600 [V:onnxruntime:Default, js_execution_provider.cc:735 JsExecutionProvider] Graph capture enable: 0
ort-wasm-simd-threaded.jsep.wasm:0x1039a7f 2024-10-13 21:42:57.538899 [I:onnxruntime:Default, fallback_cpu_capability.cc:86 operator()] Candidate for fallback CPU execution: /model/model/input_proj.3/input_proj.3.1/Reshape_1
Unfortunately the analyzing the profiling data section is under construction, so I'm not sure how to act on the above information. Any help appreciated!
To reproduce
The converted model file is available here.
The code to reproduce the problem is available as a gist here: simply load the model and it will attempt to run inference twice and measure the time.
Urgency
The world is not going to end, but my research project is blocked on this 😢
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.20.0-dev.20241012-332173509d
Execution Provider
'webgpu' (WebGPU)