This Python pipeline is to show how to run SenseVioce ASR on Intel CPU/GPU/NPU thru ONNX Runtime + OpenVINO Execution Provider
This implementation is forked from RKNN implementation of sherpa-onnx project
Audio samples ("en.mp3", "ja.mp3", "ko.mp3", "yue.mp3" and "zh.mp3") are downloaded from Hugging Face FunAudioLLM SenseVoice Small
- Off-line (non-streaming) mode
- Multilingual speech recognition for Chinese, Cantonese, English, Japanese, and Korean
- Models are converted to static (mainly for NPU)
Visit https://huggingface.co/FunAudioLLM/SenseVoiceSmall/tree/main, download the following files
am.mvn
chn_jpn_yue_eng_ko_spectok.bpe.model
config.yaml
configuration.json
model.pt
Run the following commands to export models
pip install -r requirements.txt
python export-onnx.py --input-len-in-seconds 5
- As NPU does not support dynamic input shape, it is required to convert to static by specifying a fixed input length. You may configure the input length by setting "
--input-len-in-seconds <length>" per your requirement.
The following models (*.onnx) will be exported under the same directory
model-5-seconds.onnx
The project directory should look like
(openvino_venv) C:\Github\sense-voice-ovep-python-static>dir
Volume in drive C is InstallTo
Volume Serial Number is 76DF-BB22
Directory of C:\Github\sense-voice-ovep-python-static
11/24/2025 03:08 PM <DIR> .
11/24/2025 02:58 PM <DIR> ..
11/24/2025 03:02 PM 11,203 am.mvn
11/24/2025 03:02 PM 377,341 chn_jpn_yue_eng_ko_spectok.bpe.model
11/24/2025 03:02 PM 1,855 config.yaml
11/24/2025 03:02 PM 396 configuration.json
11/23/2025 08:02 PM <DIR> example
11/23/2025 08:02 PM 4,802 export-onnx.py
11/24/2025 03:08 PM 928,985,316 model-5-seconds.onnx
11/24/2025 03:03 PM 936,291,369 model.pt
11/23/2025 08:02 PM 6,887 README.md
11/23/2025 08:02 PM 173 requirements.txt
11/23/2025 08:02 PM 7,171 test_onnx.py
11/24/2025 03:07 PM 340,949 tokens.txt
11/23/2025 08:02 PM 19,498 torch_model.py
11/24/2025 03:07 PM <DIR> __pycache__
12 File(s) 1,866,046,960 bytes
4 Dir(s) 225,240,248,320 bytes free
Usage
usage: test_onnx.py [-h] [--device DEVICE] --model MODEL --tokens TOKENS --wave WAVE [--language LANGUAGE]
[--use-itn USE_ITN]
options:
-h, --help show this help message and exit
--device DEVICE Execution device. Use 'CPU', 'GPU', 'NPU' for OpenVINO. If not specified, the default
CPUExecutionProvider will be used. (default: None)
--model MODEL Path to model.onnx (default: None)
--tokens TOKENS Path to tokens.txt (default: None)
--wave WAVE The input wave to be recognized (default: None)
--language LANGUAGE the language of the input wav file. Supported values: zh, en, ja, ko, yue, auto (default:
auto)
--use-itn USE_ITN 1 to use inverse text normalization. 0 to not use inverse text normalization (default: 0)
Run on CPU
python test_onnx.py --device CPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3
Run on CPU. Output text with punctuation and inverse text normalization.
python test_onnx.py --device CPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3 --use-itn 1
Run on GPU
python test_onnx.py --device GPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3
Run on NPU
python test_onnx.py --device NPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3
Without punctuation and inverse text normalization
(ovep_venv) C:\GitHub\sense-voice-ovep-python-static>python test_onnx.py --device NPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3
{'device': 'NPU', 'model': 'model-5-seconds.onnx', 'tokens': 'tokens.txt', 'wave': 'example\\zh.mp3', 'language': 'auto', 'use_itn': 0}
Device: OpenVINO EP with device = NPU
features.shape (93, 560)
features.shape (83, 560)
<|zh|><|NEUTRAL|><|Speech|><|woitn|>开 放 时 间 早 上 九 点 至 下 午 五 点
With punctuation and inverse text normalization
(ovep_venv) C:\GitHub\sense-voice-ovep-python-static>python test_onnx.py --device NPU --model model-5-seconds.onnx --token tokens.txt --wav example\zh.mp3 --use-itn 1
{'device': 'NPU', 'model': 'model-5-seconds.onnx', 'tokens': 'tokens.txt', 'wave': 'example\\zh.mp3', 'language': 'auto', 'use_itn': 1}
Device: OpenVINO EP with device = NPU
features.shape (93, 560)
features.shape (83, 560)
<|zh|><|NEUTRAL|><|Speech|><|withitn|>开 放 时 间 早 上 9 点 至 下 午 5 点 。
Full log (from scratch) is provided for reference
The pipeline has been validated on a Intel(R) Core(TM) Ultra 7 268V (Lunar Lake) system, with
iGPU: Intel(R) Arc(TM) 140V GPU, driver 32.0.101.8247 (10/22/2025)NPU: Intel(R) AI Boost, driver 32.0.100.4404 (11/7/2025)
| Sample | CPU | GPU | NPU |
|---|---|---|---|
| English (en.mp3) | OK | OK | OK |
| Japanese (ja.mp3) | OK | OK | OK |
| Korean (ko.mp3) | OK | OK | OK |
| Cantonese (yue.mp3) | OK | OK | OK |
| Chinese (zh.mp3) | OK | OK | OK |
If the following warning appears when running the pipeline thru OVEP for the 1st time
C:\Users\...\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:123:
User Warning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.
Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
This would be caused by that both onnxruntime and onnxruntime-openvino are installed. Solution is to remove both of them then re-install onnxruntime-openvino
pip uninstall -y onnxruntime onnxruntime-openvino
pip install onnxruntime-openvino~=1.23.0
Or simply to re-install onnxruntime-openvino if you would like to keep onnxruntime
pip uninstall -y onnxruntime-openvino
pip install onnxruntime-openvino~=1.23.0