Skip to content

Conversation

@flexiondotorg
Copy link
Contributor

Summary

Add speech candidate detection to identify representative speech regions for future adaptive filter tuning. This complements the existing silence detection by providing measurements of typical speech characteristics.

Changes

Speech Detection (commit 1)

  • Add SpeechRegion and SpeechCandidateMetrics data structures
  • Implement interval-based speech detection after elected silence
  • Score speech regions by amplitude, centroid, and entropy
  • Select longest qualifying candidate for speech profiling
  • Integrate detection into MeasureInput analysis pipeline
  • Add diagnostic output to processing reports
  • Add 21 unit tests for speech detection logic

Expanded Metrics (commit 2)

  • Add all spectral metrics (mean, variance, spread, skewness, crest, flux, slope, decrease, rolloff) to silence and speech candidates
  • Add loudness metrics (momentary/short-term LUFS, true/sample peak)
  • Update measurement functions to populate complete metric set
  • Enhance diagnostic report with organised metric groups

Design

  • Speech detection runs after silence detection, searching only after the elected silence region ends
  • Uses 30-second minimum duration with 2-second interruption tolerance for natural pauses
  • Scoring prioritises amplitude (50%), voice-range centroid (30%), and low entropy (20%)
  • Selection prefers longest duration above quality threshold (unlike silence which prefers earliest)

Testing

All existing tests pass. New tests cover:

  • TestSpeechScore — 6 test cases for speech scoring
  • TestFindSpeechCandidatesFromIntervals — 6 test cases for detection logic
  • TestMeasureSpeechCandidateFromIntervals — 2 test cases for metrics extraction
  • TestFindBestSpeechRegion — 3 test cases for candidate selection
  • TestScoreSpeechCandidate — 4 test cases for candidate scoring

Implements docs/PLAN-SpeechDetection.md

- Add SpeechRegion and SpeechCandidateMetrics data structures
- Implement interval-based speech detection after elected silence
- Score speech regions by amplitude, centroid, and entropy
- Select longest qualifying candidate for speech profiling
- Integrate detection into MeasureInput analysis pipeline
- Add diagnostic output to processing reports

Signed-off-by: Martin Wimpress <[email protected]>
…urements

- Add all spectral metrics (mean, variance, spread, skewness, crest,
  flux, slope, decrease, rolloff) to silence and speech candidates
- Add loudness metrics (momentary/short-term LUFS, true/sample peak)
- Update measurement functions to populate complete metric set
- Enhance diagnostic report with organised metric groups
Enables future adaptive filter tuning using full audio characteristics.

Signed-off-by: Martin Wimpress <[email protected]>
@flexiondotorg
Copy link
Contributor Author

@cubic-dev-ai Review this pull request

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 15, 2026

@cubic-dev-ai Review this pull request

@flexiondotorg I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/processor/analyzer.go">

<violation number="1" location="internal/processor/analyzer.go:2777">
P1: If no interval exists at or after `searchStart`, `startIdx` remains 0 and speech detection incorrectly searches from the beginning of the file instead of returning nil. This could detect speech within the silence region for short recordings or when silence is near the end.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
@flexiondotorg flexiondotorg merged commit 536273c into main Jan 15, 2026
5 checks passed
@flexiondotorg flexiondotorg deleted the speech-detection branch January 15, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants