optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Very long audio files might crash whisperX during the call to
load_audio()
in case the system runs out of RAM.This PR adds the parameter
useTmpFiles
toload_audio()
, which makesffmpeg
resample the audio using a temporary file, instead of trying to do it all on memory, thus substantially increasing the audio length whisperX can handle.NOTE: if the audio file is long enough to crash the system during the
load_audio()
call, requiring the usage of temporary files, the system will probably run out of memory during the diarization stage too. I dealt with that by splitting the source audio in two or more parts, using the alignment stage result timestamps to avoid splitting the audio in the middle of a sentence. That code is not including in this patch and, even if it was, it still has some issues because, if we split the audio, the diarization might assign different speaker IDs to the same speaker on each one of those parts.