-
Notifications
You must be signed in to change notification settings - Fork 191
Docs - Update curate-audio/process-data/quality-assessment/index.md #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arhamm1
wants to merge
1
commit into
main
Choose a base branch
from
arhamm1-patch-7
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Arham Mehta <[email protected]>
Contributor
Author
|
Should the code sample be updated to -> Combined Quality Filteringfrom nemo_curator.pipeline import Pipeline
from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage
from nemo_curator.stages.audio.common import GetAudioDurationStage, PreserveByValueStage
# Create multi-stage quality pipeline
quality_pipeline = Pipeline(name="audio_quality_assessment")
# Calculate all metrics
quality_pipeline.add_stage(GetPairwiseWerStage(
text_key="text",
pred_text_key="pred_text",
wer_key="wer"
))
quality_pipeline.add_stage(GetAudioDurationStage(
audio_filepath_key="audio_filepath",
duration_key="duration"
))
# Apply filters in sequence
quality_pipeline.add_stage(PreserveByValueStage(
input_value_key="wer",
target_value=50.0,
operator="le" # WER <= 50%
))
quality_pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=1.0,
operator="ge" # Duration >= 1s
))
quality_pipeline.add_stage(PreserveByValueStage(
input_value_key="duration",
target_value=20.0,
operator="le" # Duration <= 20s
)) |
Contributor
Greptile Summary
Important Files Changed
Confidence score: 5/5
Sequence DiagramsequenceDiagram
participant User
participant Pipeline
participant CreateInitialManifestStage
participant InferenceAsrNemoStage
participant GetPairwiseWerStage
participant GetAudioDurationStage
participant PreserveByValueStage
participant AudioToDocumentStage
participant JsonlWriter
participant XennaExecutor
User->>Pipeline: "Create audio quality assessment pipeline"
User->>Pipeline: "Add CreateInitialManifestFleursStage"
Pipeline->>CreateInitialManifestStage: "Load audio data from FLEURS dataset"
CreateInitialManifestStage-->>Pipeline: "Audio manifest data"
User->>Pipeline: "Add InferenceAsrNemoStage"
Pipeline->>InferenceAsrNemoStage: "Perform ASR inference"
InferenceAsrNemoStage-->>Pipeline: "Audio data with predictions"
User->>Pipeline: "Add GetPairwiseWerStage"
Pipeline->>GetPairwiseWerStage: "Calculate WER metrics"
GetPairwiseWerStage-->>Pipeline: "Audio data with WER scores"
User->>Pipeline: "Add GetAudioDurationStage"
Pipeline->>GetAudioDurationStage: "Calculate audio duration"
GetAudioDurationStage-->>Pipeline: "Audio data with duration"
User->>Pipeline: "Add WER filter stage"
Pipeline->>PreserveByValueStage: "Filter by WER <= 75%"
PreserveByValueStage-->>Pipeline: "Filtered by WER"
User->>Pipeline: "Add duration min filter"
Pipeline->>PreserveByValueStage: "Filter duration >= 1s"
PreserveByValueStage-->>Pipeline: "Filtered by min duration"
User->>Pipeline: "Add duration max filter"
Pipeline->>PreserveByValueStage: "Filter duration <= 30s"
PreserveByValueStage-->>Pipeline: "Filtered by max duration"
User->>Pipeline: "Add AudioToDocumentStage"
Pipeline->>AudioToDocumentStage: "Convert to document format"
AudioToDocumentStage-->>Pipeline: "Document format data"
User->>Pipeline: "Add JsonlWriter"
Pipeline->>JsonlWriter: "Write high quality results"
JsonlWriter-->>Pipeline: "Export complete"
User->>Pipeline: "Execute with XennaExecutor"
Pipeline->>XennaExecutor: "Run quality assessment pipeline"
XennaExecutor-->>User: "High-quality filtered audio dataset"
|
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
docs: improve quality assessment index with complete examples
Fixes incomplete code examples and adds context for utility functions.
Usage
# Add snippet demonstrating usageChecklist