-
Notifications
You must be signed in to change notification settings - Fork 0
Feature implementation from commits 5ff9694..3069512 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-base-2
Are you sure you want to change the base?
Conversation
Co-authored-by: Raivis Dejus <[email protected]>
Co-authored-by: Raivis Dejus <[email protected]>
This also changes how models for Whisper.cpp are downloaded. After update of the app models will need to be re-downloaded if you have them already downloaded.
Co-authored-by: Raivis Dejus <[email protected]>
import requests | ||
import whisper | ||
import huggingface_hub | ||
import zipfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
Missing faster_whisper import causes runtime errors.
The removal of the faster_whisper import will cause runtime errors when the application tries to use the FASTER_WHISPER model type functionality.
Current Code (Diff):
+ import requests
+ import whisper
+ import huggingface_hub
+ import zipfile
+ import faster_whisper
📝 Committable suggestion
‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀
import requests | |
import whisper | |
import huggingface_hub | |
import zipfile | |
import requests | |
import whisper | |
import huggingface_hub | |
import zipfile | |
import faster_whisper |
🔄 Dependencies Affected
buzz/model_loader.py
Function: ModelLoader.is_deletable
Issue: References to ModelType.FASTER_WHISPER will fail without the faster_whisper module
Suggestion: Ensure the faster_whisper import is available for code that references ModelType.FASTER_WHISPER
buzz/model_loader.py
Function: ModelLoader.download_faster_whisper_model
Issue: This function likely depends on the faster_whisper module that was removed
Suggestion: Restore the faster_whisper import or refactor the function to not require it
if size == WhisperModelSize.CUSTOM and custom_repo_id == "": | ||
raise ValueError("Custom model id is not provided") | ||
|
||
if size == WhisperModelSize.CUSTOM: | ||
repo_id = custom_repo_id | ||
elif size == WhisperModelSize.LARGEV3: | ||
repo_id = "Systran/faster-whisper-large-v3" | ||
# Maybe switch to 'mobiuslabsgmbh/faster-whisper-large-v3-turbo', seems to be used in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
Unresolved Implementation Decision.
The comment indicates uncertainty about which model repository to use, suggesting incomplete implementation that could cause issues if deployed without resolution.
Current Code (Diff):
- # Maybe switch to 'mobiuslabsgmbh/faster-whisper-large-v3-turbo', seems to be used in
- # faster-whisper code https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/utils.py#L29
- # If so changes needed also in whisper_file_transcriber.py
+ # Using deepdml/faster-whisper-large-v3-turbo-ct2 repository for LARGEV3TURBO model
txt = whisper_cpp.whisper_full_get_segment_text(self.ctx, i) | ||
start = whisper_cpp.whisper_full_get_segment_t0(self.ctx, i) | ||
end = whisper_cpp.whisper_full_get_segment_t1(self.ctx, i) | ||
txt = self.instance.full_get_segment_text(self.ctx, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
Runtime Exception: NotImplementedError in Base Class Methods.
The code calls methods on self.instance that are defined in the base class as raising NotImplementedError, which will cause runtime exceptions if not properly implemented in subclasses.
Current Code (Diff):
- txt = self.instance.full_get_segment_text(self.ctx, i)
+ # Ensure self.instance implements full_get_segment_text or handle NotImplementedError
+ txt = self.instance.full_get_segment_text(self.ctx, i)
Proposed Code:
txt = self.instance.full_get_segment_text(self.ctx, i)
🔄 Dependencies Affected
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_text
Issue: Method raises NotImplementedError but is being called directly
Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_t0
Issue: Method raises NotImplementedError but is being called directly
Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_t1
Issue: Method raises NotImplementedError but is being called directly
Suggestion: Implement the method in the appropriate subclass or ensure self.instance is of a type that implements this method
txt = self.instance.full_get_segment_text(self.ctx, i) | ||
start = self.instance.full_get_segment_t0(self.ctx, i) | ||
end = self.instance.full_get_segment_t1(self.ctx, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
Runtime Exception: Calling Unimplemented Methods.
The code is calling methods on self.instance that are defined but raise NotImplementedError, which will cause the application to crash at runtime.
Current Code (Diff):
- txt = self.instance.full_get_segment_text(self.ctx, i)
- start = self.instance.full_get_segment_t0(self.ctx, i)
- end = self.instance.full_get_segment_t1(self.ctx, i)
+ txt = whisper_cpp.whisper_full_get_segment_text(self.ctx, i)
+ start = whisper_cpp.whisper_full_get_segment_t0(self.ctx, i)
+ end = whisper_cpp.whisper_full_get_segment_t1(self.ctx, i)
🔄 Dependencies Affected
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_text
Issue: Method raises NotImplementedError but is being called in the modified code
Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_text
Current Code (Diff):
- def full_get_segment_text(self, ctx, i):
- raise NotImplementedError
+ def full_get_segment_text(self, ctx, i):
+ return whisper_cpp.whisper_full_get_segment_text(ctx, i)
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_t0
Issue: Method raises NotImplementedError but is being called in the modified code
Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_t0
Current Code (Diff):
- def full_get_segment_t0(self, ctx, i):
- raise NotImplementedError
+ def full_get_segment_t0(self, ctx, i):
+ return whisper_cpp.whisper_full_get_segment_t0(ctx, i)
buzz/transcriber/whisper_cpp.py
Function: WhisperCpp.full_get_segment_t1
Issue: Method raises NotImplementedError but is being called in the modified code
Suggestion: Implement the method to provide the same functionality as whisper_cpp.whisper_full_get_segment_t1
Current Code (Diff):
- def full_get_segment_t1(self, ctx, i):
- raise NotImplementedError
+ def full_get_segment_t1(self, ctx, i):
+ return whisper_cpp.whisper_full_get_segment_t1(ctx, i)
file_name = self.model().data(self.model().index(row_index, 3)) | ||
url = self.model().data(self.model().index(row_index, 14)) | ||
|
||
selected_text += f"{file_name}{url}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
Missing separator between file_name and URL.
The code concatenates file_name and URL without a separator, making them indistinguishable in the output which will cause confusion for users.
Current Code (Diff):
- selected_text += f"{file_name}{url}\n"
+ selected_text += f"{file_name} - {url}\n"
📝 Committable suggestion
‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀
selected_text += f"{file_name}{url}\n" | |
selected_text += f"{file_name} - {url}\n" |
file_name = self.model().data(self.model().index(row_index, 3)) | ||
url = self.model().data(self.model().index(row_index, 14)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Maintainability Issue
Hard-coded column indices create brittle code.
Using hard-coded column indices (3 and 14) makes the code fragile as any change to column order will break this functionality.
Current Code (Diff):
- file_name = self.model().data(self.model().index(row_index, 3))
- url = self.model().data(self.model().index(row_index, 14))
+ file_name = self.model().data(self.model().index(row_index, Column.FILE_NAME.value))
+ url = self.model().data(self.model().index(row_index, Column.URL.value))
📝 Committable suggestion
‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀
file_name = self.model().data(self.model().index(row_index, 3)) | |
url = self.model().data(self.model().index(row_index, 14)) | |
file_name = self.model().data(self.model().index(row_index, Column.FILE_NAME.value)) | |
url = self.model().data(self.model().index(row_index, Column.URL.value)) |
if event.matches(QKeySequence.StandardKey.Copy): | ||
self.copy_selected_fields() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Correctness Issue
No handling for empty selection.
The copy operation doesn't check if any rows are selected, which could lead to unexpected behavior when copying with no selection.
Current Code (Diff):
- if event.matches(QKeySequence.StandardKey.Copy):
- self.copy_selected_fields()
- return
+ if event.matches(QKeySequence.StandardKey.Copy):
+ if not self.selectionModel().hasSelection():
+ return
+ self.copy_selected_fields()
+ return
📝 Committable suggestion
‼️ IMPORTANT
Trust, but verify! 🕵️ Please review this suggestion with the care of a code archaeologist - check that it perfectly replaces the highlighted code, preserves all lines, maintains proper indentation, and won't break anything in production. Your future self will thank you! 🚀
if event.matches(QKeySequence.StandardKey.Copy): | |
self.copy_selected_fields() | |
if event.matches(QKeySequence.StandardKey.Copy): | |
if not self.selectionModel().hasSelection(): | |
return | |
self.copy_selected_fields() |
PR Summary
Enhanced Transcription Capabilities and Performance Optimizations
Overview
This PR adds CoreML support for Apple Silicon, improves locale handling, optimizes transcription performance, and fixes several bugs in the transcription pipeline. It also adds quality-of-life features like clipboard support and environment variable configurations.
Change Types
Affected Modules
locale.py
model_loader.py
transcriber/whisper_cpp.py
transcriber/file_transcriber.py
transcriber/recording_transcriber.py
transcriber/whisper_cpp_file_transcriber.py
transcriber/whisper_file_transcriber.py
widgets/transcription_tasks_table_widget.py
widgets/transcription_viewer/transcription_segments_editor_widget.py
Notes for Reviewers
Additional Context