Feature implementation from commits 5ff9694..3069512 #2

yashuatla · 2025-06-23T15:52:25Z

PR Summary

Enhanced Transcription Capabilities and Performance Optimizations

Overview

This PR adds CoreML support for Apple Silicon, improves locale handling, optimizes transcription performance, and fixes several bugs in the transcription pipeline. It also adds quality-of-life features like clipboard support and environment variable configurations.

Change Types

Type	Description
Feature	Added CoreML support for Apple Silicon Macs
Feature	Added custom locale support via environment variables
Feature	Added word timestamps support in transcription
Enhancement	Improved YouTube download configuration with cookie support
Enhancement	Added clipboard support for transcription tasks
Refactor	Restructured WhisperCpp class to support multiple backends
Bugfix	Fixed FFmpeg command line argument ordering
Optimization	Improved performance in transcription processing

Affected Modules

Module / File	Change Description
`locale.py`	Added environment variable override for locale settings
`model_loader.py`	Added CoreML support detection and model handling
`transcriber/whisper_cpp.py`	Refactored to support multiple backends (CPU/CoreML)
`transcriber/file_transcriber.py`	Fixed FFmpeg command ordering and improved YouTube downloads
`transcriber/recording_transcriber.py`	Optimized audio processing with reduced queue size
`transcriber/whisper_cpp_file_transcriber.py`	Improved model initialization and reuse
`transcriber/whisper_file_transcriber.py`	Added word timestamps support
`widgets/transcription_tasks_table_widget.py`	Added clipboard support for file names and URLs
`widgets/transcription_viewer/transcription_segments_editor_widget.py`	Performance optimizations for row handling

Notes for Reviewers

The CoreML implementation for Apple Silicon should be thoroughly tested on M1/M2 Macs
The queue size reduction from 5 to 3 batches may affect transcription behavior in edge cases
The environment variable support (BUZZ_LOCALE) changes how the application handles localization

Additional Context

The CoreML support is specifically targeted at Apple Silicon (M1/M2) Macs
YouTube download now supports cookies via environment variable
The WhisperCpp refactoring introduces a more flexible architecture for supporting different backends

…lliams#960)

Co-authored-by: Raivis Dejus <[email protected]>

…hidiwilliams#971)

Co-authored-by: Raivis Dejus <[email protected]>

This also changes how models for Whisper.cpp are downloaded. After update of the app models will need to be re-downloaded if you have them already downloaded.

Co-authored-by: Raivis Dejus <[email protected]>

…hidiwilliams#986)

codeowlai · 2025-06-23T15:57:28Z

buzz/model_loader.py

+import requests
+import whisper
+import huggingface_hub
+import zipfile


🐛 Correctness Issue

Missing faster_whisper import causes runtime errors.

The removal of the faster_whisper import will cause runtime errors when the application tries to use the FASTER_WHISPER model type functionality.

Current Code (Diff):

+ import requests + import whisper + import huggingface_hub + import zipfile + import faster_whisper

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

import requests

import whisper

import huggingface_hub

import zipfile

import requests

import whisper

import huggingface_hub

import zipfile

import faster_whisper

🔄 Dependencies Affected

buzz/model_loader.py

Function: ModelLoader.is_deletable

Issue: References to ModelType.FASTER_WHISPER will fail without the faster_whisper module

Suggestion: Ensure the faster_whisper import is available for code that references ModelType.FASTER_WHISPER

buzz/model_loader.py

Function: ModelLoader.download_faster_whisper_model

Issue: This function likely depends on the faster_whisper module that was removed

Suggestion: Restore the faster_whisper import or refactor the function to not require it

codeowlai · 2025-06-23T15:57:32Z

buzz/model_loader.py

    if size == WhisperModelSize.CUSTOM and custom_repo_id == "":
        raise ValueError("Custom model id is not provided")

    if size == WhisperModelSize.CUSTOM:
        repo_id = custom_repo_id
    elif size == WhisperModelSize.LARGEV3:
        repo_id = "Systran/faster-whisper-large-v3"
+    # Maybe switch to 'mobiuslabsgmbh/faster-whisper-large-v3-turbo', seems to be used in


🐛 Correctness Issue

Unresolved Implementation Decision.

The comment indicates uncertainty about which model repository to use, suggesting incomplete implementation that could cause issues if deployed without resolution.

Current Code (Diff):

- # Maybe switch to 'mobiuslabsgmbh/faster-whisper-large-v3-turbo', seems to be used in - # faster-whisper code https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/utils.py#L29 - # If so changes needed also in whisper_file_transcriber.py + # Using deepdml/faster-whisper-large-v3-turbo-ct2 repository for LARGEV3TURBO model

codeowlai · 2025-06-23T15:57:35Z

buzz/transcriber/whisper_cpp.py

-                txt = whisper_cpp.whisper_full_get_segment_text(self.ctx, i)
-                start = whisper_cpp.whisper_full_get_segment_t0(self.ctx, i)
-                end = whisper_cpp.whisper_full_get_segment_t1(self.ctx, i)
+                txt = self.instance.full_get_segment_text(self.ctx, i)


🐛 Correctness Issue

Runtime Exception: NotImplementedError in Base Class Methods.

The code calls methods on self.instance that are defined in the base class as raising NotImplementedError, which will cause runtime exceptions if not properly implemented in subclasses.

Current Code (Diff):

- txt = self.instance.full_get_segment_text(self.ctx, i) + # Ensure self.instance implements full_get_segment_text or handle NotImplementedError + txt = self.instance.full_get_segment_text(self.ctx, i)

Proposed Code:

txt = self.instance.full_get_segment_text(self.ctx, i)

🔄 Dependencies Affected

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_text

Issue: Method raises NotImplementedError but is being called directly

Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_t0

Issue: Method raises NotImplementedError but is being called directly

Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_t1

Issue: Method raises NotImplementedError but is being called directly

Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method

codeowlai · 2025-06-23T15:57:38Z

buzz/transcriber/whisper_cpp.py

+                txt = self.instance.full_get_segment_text(self.ctx, i)
+                start = self.instance.full_get_segment_t0(self.ctx, i)
+                end = self.instance.full_get_segment_t1(self.ctx, i)


🐛 Correctness Issue

Runtime Exception: Calling Unimplemented Methods.

The code is calling methods on self.instance that are defined but raise NotImplementedError, which will cause the application to crash at runtime.

Current Code (Diff):

- txt = self.instance.full_get_segment_text(self.ctx, i) - start = self.instance.full_get_segment_t0(self.ctx, i) - end = self.instance.full_get_segment_t1(self.ctx, i) + txt = whisper_cpp.whisper_full_get_segment_text(self.ctx, i) + start = whisper_cpp.whisper_full_get_segment_t0(self.ctx, i) + end = whisper_cpp.whisper_full_get_segment_t1(self.ctx, i)

🔄 Dependencies Affected

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_text

Issue: Method raises NotImplementedError but is being called in the modified code

Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_text

Current Code (Diff):

- def full_get_segment_text(self, ctx, i): - raise NotImplementedError + def full_get_segment_text(self, ctx, i): + return whisper_cpp.whisper_full_get_segment_text(ctx, i)

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_t0

Issue: Method raises NotImplementedError but is being called in the modified code

Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_t0

Current Code (Diff):

- def full_get_segment_t0(self, ctx, i): - raise NotImplementedError + def full_get_segment_t0(self, ctx, i): + return whisper_cpp.whisper_full_get_segment_t0(ctx, i)

buzz/transcriber/whisper_cpp.py

Function: WhisperCpp.full_get_segment_t1

Issue: Method raises NotImplementedError but is being called in the modified code

Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_t1

Current Code (Diff):

- def full_get_segment_t1(self, ctx, i): - raise NotImplementedError + def full_get_segment_t1(self, ctx, i): + return whisper_cpp.whisper_full_get_segment_t1(ctx, i)

codeowlai · 2025-06-23T15:57:42Z

buzz/widgets/transcription_tasks_table_widget.py

+            file_name = self.model().data(self.model().index(row_index, 3))
+            url = self.model().data(self.model().index(row_index, 14))
+
+            selected_text += f"{file_name}{url}\n"


🐛 Correctness Issue

Missing separator between file_name and URL.

The code concatenates file_name and URL without a separator, making them indistinguishable in the output which will cause confusion for users.

Current Code (Diff):

- selected_text += f"{file_name}{url}\n" + selected_text += f"{file_name} - {url}\n"

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

selected_text += f"{file_name}{url}\n"

selected_text += f"{file_name} - {url}\n"

codeowlai · 2025-06-23T15:57:45Z

buzz/widgets/transcription_tasks_table_widget.py

+            file_name = self.model().data(self.model().index(row_index, 3))
+            url = self.model().data(self.model().index(row_index, 14))


🧹 Maintainability Issue

Hard-coded column indices create brittle code.

Using hard-coded column indices (3 and 14) makes the code fragile as any change to column order will break this functionality.

Current Code (Diff):

- file_name = self.model().data(self.model().index(row_index, 3)) - url = self.model().data(self.model().index(row_index, 14)) + file_name = self.model().data(self.model().index(row_index, Column.FILE_NAME.value)) + url = self.model().data(self.model().index(row_index, Column.URL.value))

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

file_name = self.model().data(self.model().index(row_index, 3))

url = self.model().data(self.model().index(row_index, 14))

file_name = self.model().data(self.model().index(row_index, Column.FILE_NAME.value))

url = self.model().data(self.model().index(row_index, Column.URL.value))

codeowlai · 2025-06-23T15:57:48Z

buzz/widgets/transcription_tasks_table_widget.py

+        if event.matches(QKeySequence.StandardKey.Copy):
+            self.copy_selected_fields()


🐛 Correctness Issue

No handling for empty selection.

The copy operation doesn't check if any rows are selected, which could lead to unexpected behavior when copying with no selection.

Current Code (Diff):

- if event.matches(QKeySequence.StandardKey.Copy): - self.copy_selected_fields() - return + if event.matches(QKeySequence.StandardKey.Copy): + if not self.selectionModel().hasSelection(): + return + self.copy_selected_fields() + return

📝 Committable suggestion

‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀

Suggested change

if event.matches(QKeySequence.StandardKey.Copy):

self.copy_selected_fields()

if event.matches(QKeySequence.StandardKey.Copy):

if not self.selectionModel().hasSelection():

return

self.copy_selected_fields()

raivisdejus and others added 15 commits October 25, 2024 17:02

Fix for row height on long transcriptions with a lot of rows (chidiwi…

c94671f

…lliams#960)

Adding notes to FAQ (chidiwilliams#962)

b354d30

Fix for url imports (chidiwilliams#965)

4772a12

Update buzz.po (chidiwilliams#966)

7650992

Adding turbo models (chidiwilliams#967)

807b43f

Fix for column visibility on Macs (chidiwilliams#969)

b868e61

Update README.md (chidiwilliams#970)

8a1a967

Co-authored-by: Raivis Dejus <[email protected]>

Adding word level timestamps for Huggingface (transformers) whisper (c…

386c151

…hidiwilliams#971)

Update buzz.po (chidiwilliams#977)

1a67b3e

Update buzz.po (chidiwilliams#973)

ce3bfea

Co-authored-by: Raivis Dejus <[email protected]>

Adding Core ML support for WhisperCpp (chidiwilliams#976)

725031e

This also changes how models for Whisper.cpp are downloaded. After update of the app models will need to be re-downloaded if you have them already downloaded.

Cn language support (chidiwilliams#982)

df63b9d

Co-authored-by: Raivis Dejus <[email protected]>

Adding locale override (chidiwilliams#985)

f61701f

Adding option to copy transcription source from transcription table (c…

73a052e

…hidiwilliams#986)

Adding support for cookiefile for Youtube downloads (chidiwilliams#988)

3069512

codeowlai bot reviewed Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature implementation from commits 5ff9694..3069512 #2

Feature implementation from commits 5ff9694..3069512 #2

Uh oh!

yashuatla commented Jun 23, 2025 •

edited by codeowlai bot

Loading

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

codeowlai bot Jun 23, 2025

Uh oh!

Uh oh!

	selected_text += f"{file_name}{url}\n"
	selected_text += f"{file_name} - {url}\n"

		file_name = self.model().data(self.model().index(row_index, 3))
		url = self.model().data(self.model().index(row_index, 14))

		if event.matches(QKeySequence.StandardKey.Copy):
		self.copy_selected_fields()

Feature implementation from commits 5ff9694..3069512 #2

Are you sure you want to change the base?

Feature implementation from commits 5ff9694..3069512 #2

Uh oh!

Conversation

yashuatla commented Jun 23, 2025 • edited by codeowlai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Overview

Change Types

Affected Modules

Notes for Reviewers

Additional Context

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

🔄 Dependencies Affected

buzz/model_loader.py

buzz/model_loader.py

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

🔄 Dependencies Affected

buzz/transcriber/whisper_cpp.py

buzz/transcriber/whisper_cpp.py

buzz/transcriber/whisper_cpp.py

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

🔄 Dependencies Affected

buzz/transcriber/whisper_cpp.py

buzz/transcriber/whisper_cpp.py

buzz/transcriber/whisper_cpp.py

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeowlai bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yashuatla commented Jun 23, 2025 •

edited by codeowlai bot

Loading