Skip to content

Conversation

ayushi-uwc
Copy link

@ayushi-uwc ayushi-uwc commented Jul 7, 2025

This change adds support for ShunyaLabs - Pingala-V1-Verbatim ASR models: https://www.shunyalabs.ai/pingala

Our results on NVIDIA L4(8 VCPUs, 32GB Memory, ⁠24GB VRAM) are as follows:

********************************************************************************
Results per dataset:
********************************************************************************
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_ami_test: WER = 4.80 %, RTFx = 0.71
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_earnings22_test: WER = 6.43 %, RTFx = 1.14
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_gigaspeech_test: WER = 4.36 %, RTFx = 1.27
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_librispeech_test.clea: WER = 1.77 %, RTFx = 1.73
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_librispeech_test.other: WER = 2.86 %, RTFx = 1.58
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_spgispeech_test: WER = 1.13 %, RTFx = 1.92
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_tedlium_test: WER = 2.08 %, RTFx = 1.66
shunyalabs/Pingala-V1-Verbatim | hf-audio-esb-datasets-test-only-sorted_voxpopuli_test: WER = 3.55 %, RTFx = 2.00

********************************************************************************
Composite Results:
********************************************************************************
shunyalabs/Pingala-V1-Verbatim: WER = 3.37 %
shunyalabs/Pingala-V1-Verbatim: RTFx = 1.56
********************************************************************************

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants