Speech Translation Model Training and Evaluation

This repository provides a comprehensive framework for fine-tuning and evaluating speech translation models, including Whisper and Seamless, on the FLEURS dataset. The framework supports both monolingual and multilingual configurations with easy-to-use training and testing scripts.

🚀 Environment Setup

Installation

# Clone the repository
git clone https://github.com/jonahdvt/ast-lrl-speech.git
cd ast-lrl-speech

# Install dependencies
conda env create -f environment.yaml
conda activate speech-processing-env

📊 Dataset Structure

The project uses the FLEURS dataset with the following structure:

Base Data Location

All dataset files are stored in the lang_aggregate_data/ directory.

Dataset Files

Language-specific info: [langname]_fleurs_info.csv - Contains metadata for each language
English translations: en_translation.csv - Reference translations in English
Audio mapping: Each audio file is mapped to corresponding translations across all languages using the codes attribute

Data Mapping

The codes attribute serves as the primary key for mapping audio files to their translations across different languages in the processing scripts.

🎯 Fine-tuning Models

Whisper Fine-tuning

Navigate to the appropriate Whisper fine-tuning script based on your dataset requirements:

train_scripts/whisper/whisper_finetune_xxxxx.py

Configuration Steps:

Select script: Choose the appropriate fine-tuning script for your target dataset
Language selection: Specify the languages you want to fine-tune on
Training iterations: Set the number of iterations (especially important for streaming)
Base model: Choose the pre-trained Whisper model to fine-tune from
Training arguments: Configure your desired training_args
Model deployment: Set up kwargs for pushing and saving the model to Hugging Face Hub

Example Configuration:

languages= [
    "ig_ng",
    "lg_ug",
    "sw_ke", 
    "yo_ng", 
    "ha_ng"
    ]  # Target languages - Example from African multilingual finetuning
whisper_model = 'openai/whisper-large-v3'

Seamless Fine-tuning

Navigate to the appropriate Seamless fine-tuning script based on your configuration:

train_scripts/seamless/seamless_finetune_xxxxx.py

Configuration Options:

Mono configuration: Single language fine-tuning
Multi configuration: Multi-language fine-tuning

Configuration Steps:

Select configuration: Choose between mono/multi configuration scripts
Language selection: Specify target languages
Training iterations: Set iteration count for streaming scenarios
Base model: Select the base Seamless model
Training arguments: Configure training parameters
Model deployment: Set up Hugging Face Hub integration in kwargs

🧪 Testing and Evaluation

Whisper Testing

python testing_scripts/whisper_test.py

Configuration:

Model selection: Choose between base or custom fine-tuned model in model_id
Language selection: Select desired languages for direct inference
Dataset: Runs inference on the FLEURS test dataset

Seamless Testing

python testing_scripts/seamless_test.py

Configuration:

Model selection: Choose between base or custom fine-tuned model in model_id
Language selection: Select desired languages for direct inference
Dataset: Runs inference on the FLEURS test dataset

Gemini API Testing

python testing_scripts/gemini.py

Requirements:

Valid Gemini API key
Configure target languages at the top of the script
Provides translation inference using Gemini 2.0 Flash on the FLEURS test dataset

ChatGPT API Testing

python testing_scripts/chatgpt.py

Requirements:

Valid OpenAI API key
Configure target languages at the top of the script
Provides translation inference using GPT 4 on the FLEURS test dataset

📈 Results

Results from all testing scripts are automatically saved and can be found in the designated output directories. To calculate metrics (WER/BLEU), see metrics.py To extract best/worst samples based on WER differencials, see dataset/sample_finder.py

🔧 Configuration Tips

Memory Management: Adjust batch sizes based on your GPU memory. Streaming is also advised to avoid downloading large datasets. When training a model, pushing it to Hugging Face Hub is a good way to avoid carrying large models locally.
Streaming: Use appropriate iteration counts for streaming datasets
Model Versioning: Use descriptive names when pushing to Hugging Face Hub
API Limits: Be mindful of API rate limits when using Gemini or GPT testing

📝 Notes

Ensure all API keys are properly configured before running external API tests
Monitor training progress and adjust hyperparameters as needed
Consider using different base models for optimal performance on specific languages

For detailed configuration examples and troubleshooting, please refer to the individual script documentation within each file.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
SAMPLE_RESULTS		SAMPLE_RESULTS
asr_corrector		asr_corrector
dataset		dataset
ft_whisper_results		ft_whisper_results
gemini_results		gemini_results
lang_aggregate_data		lang_aggregate_data
outputs		outputs
testing_scripts		testing_scripts
train_scripts		train_scripts
README.md		README.md
environment.yaml		environment.yaml
mapping.py		mapping.py
matching.py		matching.py
metrics.py		metrics.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speech Translation Model Training and Evaluation

📋 Table of Contents

🚀 Environment Setup

Installation

📊 Dataset Structure

Base Data Location

Dataset Files

Data Mapping

🎯 Fine-tuning Models

Whisper Fine-tuning

Configuration Steps:

Example Configuration:

Seamless Fine-tuning

Configuration Options:

Configuration Steps:

🧪 Testing and Evaluation

Whisper Testing

Configuration:

Seamless Testing

Configuration:

Gemini API Testing

Requirements:

ChatGPT API Testing

Requirements:

📈 Results

🔧 Configuration Tips

📝 Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

McGill-NLP/ast-lrl-speech

Folders and files

Latest commit

History

Repository files navigation

Speech Translation Model Training and Evaluation

📋 Table of Contents

🚀 Environment Setup

Installation

📊 Dataset Structure

Base Data Location

Dataset Files

Data Mapping

🎯 Fine-tuning Models

Whisper Fine-tuning

Configuration Steps:

Example Configuration:

Seamless Fine-tuning

Configuration Options:

Configuration Steps:

🧪 Testing and Evaluation

Whisper Testing

Configuration:

Seamless Testing

Configuration:

Gemini API Testing

Requirements:

ChatGPT API Testing

Requirements:

📈 Results

🔧 Configuration Tips

📝 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages