MiniSUPERB

NOTE: This repository is still under development. Some features may not be fully functional.

MiniSUPERB is a proxy dataset for SUPERB and the SUPERB Challenge. It provides a simplified and accessible way to evaluate SSL speech models.

The following diagram provides an intuitive illustration of how MiniSUPERB accelerates the evaluation process for SSL speech models:

The figure shows how our results approximate the model rankings of the SUPERB Challenge:

For more details, please refer to the original paper.

Environment compatibilities

The project was developed using the following environments.

Env	versions
os	`ubuntu-20.04`
python	`3.10`
pytorch	`1.12.1`

Introduction and Usages

MiniSUPERB supports four downstream tasks:

Automatic Speech Recognition (ASR)
Speaker Idendification (SID)
Speech Enhancement (SE)
Source Separation (SS)

The following upstream models are supported:

Models	Upstream Model Name	Paper
WavLM	wavlm_base, wavlm_base_plus, wavlm_large	arxiv
HuBERT	hubert_base, hubert_large_ll60k	arxiv
Wav2Vec 2.0	wav2vec2, wav2vec2_large_ll60k	arxiv
Modified-CPC	modified_cpc	arxiv
TERA	tera	arxiv
DeCoAR 2.0	decoar2	arxiv
Filter Bank	fbank, fbank_no_cmvn (used for SID)

Usage

Prepare data

ASR

Download [librispeech_finetuning.tgz] (https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md) and dev-clean, and test-clean from LibriSpeech.

Unzip and check the prepared file structure

DataStorage
└── LibriSpeech/
    ├── librispeech_finetuning/
    ├── dev-clean/
    └── test-clean/

SID

Download dataset from Voxceleb1 and unzip them.

voxceleb1_root="DataStorage/VoxCeleb1/"
mkdir -p $voxceleb1_root/dev
mkdir -p $voxceleb1_root/test

# prepare dev
cd $voxceleb1_root/dev/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad
cat vox1_dev* > vox1_dev_wav.zip
unzip vox1_dev_wav.zip

# prepare test
cd $voxceleb1_root/test/
wget https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
unzip vox1_test_wav.zip

Check prepared file structure

DataStorage
└── Voxceleb1/
    ├── dev/
    │   └── wav/
    │       └──Speaker id folders
    └── test/
        └── wav/
            └──Speaker id folders

SE

Download Voicebank-DEMAND dataset prepared by s3prl

wget http://140.112.21.28:9000/noisy-vctk-16k.zip
unzip noisy-vctk-16k.zip

Check the unzipped voicebank directory structure

DataStorage
    └── noisy-vctk-16k/
        ├── clean_testset_wav_16k/
        ├── clean_trainset_28spk_wav_16k/
        ├── noisy_testset_wav_16k/
        ├── noisy_trainset_28spk_wav_16k/
        ├── testset_txt/
        └── trainset_28spk_txt/

SS

Simulate Libri2Mix data for source separation. For source separation, we only need 16kHz and min condition. Make sure that SoX is installed on your machine

# Download the script and simulate Libri2Mix dataset
git clone https://github.com/s3prl/LibriMix.git
cd LibriMix 
./generate_librimix_ss.sh DataStorage

Check the unzipped voicebank directory structure

DataStorage
    └── Libri2Mix/
        └── wav16k/
            └── min/
                ├── train-100/
                ├── dev/
                ├── test/
                └── metadata/

SSL Model Evaluation

Start a new downstream training experiment with the following command:

cd minisuperb

# To evaluate a model on ASR:
bash asr.sh UpstreamModelName DataStorage

# To evaluate a model on SID:
bash sid.sh UpstreamModelName DataStorage

# SE, SS are still under development
# To evaluate a model on SE:
bash se.sh UpstreamModelName DataStorage

# To evaluate a model on SS):
bash ss.sh UpstreamModelName DataStorage

Installation

Install sox on your OS

For Linux :
```
conda install -c conda-forge sox
```
Install dependencies pip install -e ".[all]"

Features Under Development

1. Support for custom upstream models 
2. Evaluation Scripts for Speech Enhancement (SE) and Source Separation (SS)
3. Pipeline to calculate MiniSUPERB score for custom SSL models.

License

The majority of this project is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
minisuperb		minisuperb
requirements		requirements
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MiniSUPERB

Environment compatibilities

Introduction and Usages

Usage

Prepare data

ASR

SID

SE

SS

SSL Model Evaluation

Installation

Features Under Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Comet0322/MiniSUPERB

Folders and files

Latest commit

History

Repository files navigation

MiniSUPERB

Environment compatibilities

Introduction and Usages

Usage

Prepare data

ASR

SID

SE

SS

SSL Model Evaluation

Installation

Features Under Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages