NOTE: This repository is still under development. Some features may not be fully functional.
MiniSUPERB is a proxy dataset for SUPERB and the SUPERB Challenge. It provides a simplified and accessible way to evaluate SSL speech models.
The following diagram provides an intuitive illustration of how MiniSUPERB accelerates the evaluation process for SSL speech models:
The figure shows how our results approximate the model rankings of the SUPERB Challenge:
For more details, please refer to the original paper.
The project was developed using the following environments.
Env | versions |
---|---|
os | ubuntu-20.04 |
python | 3.10 |
pytorch | 1.12.1 |
MiniSUPERB supports four downstream tasks:
- Automatic Speech Recognition (ASR)
- Speaker Idendification (SID)
- Speech Enhancement (SE)
- Source Separation (SS)
The following upstream models are supported:
Models | Upstream Model Name | Paper |
---|---|---|
WavLM | wavlm_base, wavlm_base_plus, wavlm_large | arxiv |
HuBERT | hubert_base, hubert_large_ll60k | arxiv |
Wav2Vec 2.0 | wav2vec2, wav2vec2_large_ll60k | arxiv |
Modified-CPC | modified_cpc | arxiv |
TERA | tera | arxiv |
DeCoAR 2.0 | decoar2 | arxiv |
Filter Bank | fbank, fbank_no_cmvn (used for SID) |
-
Download [librispeech_finetuning.tgz] (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md) and dev-clean, and test-clean from LibriSpeech.
-
Unzip and check the prepared file structure
DataStorage └── LibriSpeech/ ├── librispeech_finetuning/ ├── dev-clean/ └── test-clean/
- Download dataset from Voxceleb1 and unzip them.
voxceleb1_root="DataStorage/VoxCeleb1/" mkdir -p $voxceleb1_root/dev mkdir -p $voxceleb1_root/test # prepare dev cd $voxceleb1_root/dev/ wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad cat vox1_dev* > vox1_dev_wav.zip unzip vox1_dev_wav.zip # prepare test cd $voxceleb1_root/test/ wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip unzip vox1_test_wav.zip
- Check prepared file structure
DataStorage └── Voxceleb1/ ├── dev/ │ └── wav/ │ └──Speaker id folders └── test/ └── wav/ └──Speaker id folders
-
Download Voicebank-DEMAND dataset prepared by s3prl
wget http://140.112.21.28:9000/noisy-vctk-16k.zip unzip noisy-vctk-16k.zip
-
Check the unzipped voicebank directory structure
DataStorage └── noisy-vctk-16k/ ├── clean_testset_wav_16k/ ├── clean_trainset_28spk_wav_16k/ ├── noisy_testset_wav_16k/ ├── noisy_trainset_28spk_wav_16k/ ├── testset_txt/ └── trainset_28spk_txt/
-
Simulate Libri2Mix data for source separation. For source separation, we only need 16kHz and min condition. Make sure that SoX is installed on your machine
# Download the script and simulate Libri2Mix dataset git clone https://github.com/s3prl/LibriMix.git cd LibriMix ./generate_librimix_ss.sh DataStorage
-
Check the unzipped voicebank directory structure
DataStorage └── Libri2Mix/ └── wav16k/ └── min/ ├── train-100/ ├── dev/ ├── test/ └── metadata/
Start a new downstream training experiment with the following command:
cd minisuperb
# To evaluate a model on ASR:
bash asr.sh UpstreamModelName DataStorage
# To evaluate a model on SID:
bash sid.sh UpstreamModelName DataStorage
# SE, SS are still under development
# To evaluate a model on SE:
bash se.sh UpstreamModelName DataStorage
# To evaluate a model on SS):
bash ss.sh UpstreamModelName DataStorage
-
Install sox on your OS
For Linux :
conda install -c conda-forge sox
-
Install dependencies
pip install -e ".[all]"
1. Support for custom upstream models
2. Evaluation Scripts for Speech Enhancement (SE) and Source Separation (SS)
3. Pipeline to calculate MiniSUPERB score for custom SSL models.
The majority of this project is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.