this project provide tools
- locally training audio classify by labels (fine ture wav2vec2)
- classify audio files
- locally llm fine tune
- audio data format
- file convert
- file split
- audio to mel transform
- audio analysis
- multiple speaker split
- voice pad score
import config
import utils
import glob
# split details in config.py
# > `max_audio_length_ms`, `format_long_audio_split_name`
# > `raw_audio_path`, `train_format_audio_path`
for i in glob.glob("./public_voice/*"):
utils.split_long_wav(i)from audio_analysis import multiple_speaker
import config
import utils
import glob
import os
for i in glob.glob(os.path.join(config.train_format_audio_path, "__split*")):
multiple_speaker.get_speaker_dict(i)single, overlap = utils.classify_overlap_dicts(dicts)import config
import utils
import glob
import os
single_dicts, overlap_dicts = list(), list()
for i in glob.glob(os.path.join(config.audio_analysis_save_path, "analysis_*split*.json")):
dicts = utils.load_json(i)
if len(dicts) > 0:
single, overlap = utils.classify_overlap_dicts(dicts)
single_dicts.extend(single), overlap_dicts.extend(overlap)
utils.save_json(single_dicts, os.path.join(config.audio_analysis_save_path, "all_analysis_single.json"))
utils.save_json(overlap_dicts, os.path.join(config.audio_analysis_save_path, "all_analysis_overlap.json"))import config
import utils
import os
dicts = utils.load_json(os.path.join(config.audio_analysis_save_path, "all_analysis_single.json"))
utils.split_audio_clips(dicts)from audio_analysis import emotion
import config
import utils
import os
dicts = utils.load_json(os.path.join(config.audio_analysis_save_path, "all_analysis_single.json"))
pad_dicts = emotion.get_pad_dicts(dicts)- update valuable
labelinconfig.py, you can modify the files default path if you want to. - update
llm/llm_fine_tune_data.pyif you wanna fine tune llm parts.
- label audio file path (default)
./classify/(label_name)/*.wav - raw audio path (default)
./raw - pretrain format audio path (default)
./format - mel caches path (default)
./mel - wav2vec2 model path (default)
./model/wav2vec2-large-robust-12-ft-emotion-msp-dim - audio classify model path (default)
./model/classify - not classify audio path (default)
./unlabeled-classify - classify audio output directory
./classify
python -m audio_classify.label_trainpython -m audio_classify.label <model_path> <glob_wav_string>python -m llm.llm_fine_tune