optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221

to-audiobook · 2025-09-02T20:22:30Z

Very long audio files might crash whisperX during the call to load_audio() in case the system runs out of RAM.

This PR adds the parameter useTmpFiles to load_audio(), which makes ffmpeg resample the audio using a temporary file, instead of trying to do it all on memory, thus substantially increasing the audio length whisperX can handle.

NOTE: if the audio file is long enough to crash the system during the load_audio() call, requiring the usage of temporary files, the system will probably run out of memory during the diarization stage too. I dealt with that by splitting the source audio in two or more parts, using the alignment stage result timestamps to avoid splitting the audio in the middle of a sentence. That code is not including in this patch and, even if it was, it still has some issues because, if we split the audio, the diarization might assign different speaker IDs to the same speaker on each one of those parts.

Freeing the ffmpeg call output buffer after it is initially loaded into a numpy array allows us to have much more free RAM for the next operations. This helps avoiding running out of RAM when dealing with very long audio files.

apparently they changed some default arguments values after transformers v4.51.0 I believe num_beams is the culprit. It used to be 1, now it is set to 5. See huggingface/transformers#40682 So, according to them if you pass num_beams=1 to the pipeline, versions >4.51.0 be as fast as before. But, since I am not exactly sure where to put that, I'll just lock the version for now.

to-audiobook added 8 commits September 2, 2025 18:50

free ffmpeg output buffer before manipulating np array

f1aef55

Freeing the ffmpeg call output buffer after it is initially loaded into a numpy array allows us to have much more free RAM for the next operations. This helps avoiding running out of RAM when dealing with very long audio files.

execute garbage collection after deleting ffmpeg out buffer

d3697fc

maybe if we use a temporary file

73146e5

temp files worked. Lets try both

7d8f84c

oops

28721fd

oops I did it again

d64cb0b

try catch does not work. So let the user decide

46a06ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221

optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221

Uh oh!

to-audiobook commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221

Are you sure you want to change the base?

optionally use temp files on load_audio() to avoid running out of RAM when dealing with long audio files #1221

Uh oh!

Conversation

to-audiobook commented Sep 2, 2025

Uh oh!

Uh oh!