Skip to content

Conversation

@kunci115
Copy link

feat: Add comprehensive data integrity verification system with client rejection

  • Implement checksum-based verification for WebSocket audio transmission
  • Add server-side rejection policies for corrupted data clients
  • Create configurable corruption thresholds (0=strict, N=tolerant)
  • Add per-client corruption tracking with automatic cleanup

Server Features:

  • New CLI flags: --verify-data-integrity, --reject-corrupted-data, --corruption-threshold
  • Real-time verification of audio length and checksum
  • Structured rejection messages sent to clients before disconnection
  • Extended logging support for verification results

@KoljaB
Copy link
Owner

KoljaB commented Aug 23, 2025

That's a great PR. I think probably it needs some changes to stt_cli_client.py in RealtimeSTT_server folder and then maybe to audio_input.py and audio_recorder_client.py in RealtimeSTT folder to adapt to this new chunk verification logic. Since I'm doing some refacturing currently too I think I'll merge it later and then add the needed changes.

@kunci115
Copy link
Author

I'll help in case I'm free on weekend if you haven't finished it, I also forgot to push the other test that I wrote

kunci115 and others added 9 commits September 1, 2025 01:17
  1. RealtimeSTT/audio_recorder_client.py

  - Added calculate_checksum() method for data verification
  - Updated feed_audio() method to include verification data when server_sent_to_stt is present
  - Added enable_data_verification parameter to constructor
  - Updated record_and_send_audio() method to include verification metadata when enabled

  2. RealtimeSTT_server/stt_cli_client.py

  - Added --verify-data command line flag
  - Updated client instantiation to pass the verification flag

  Key Features Added

  Optional Data Verification:
  - Backward compatible - verification is off by default
  - When enabled, adds checksum, data length, and timestamp to metadata
  - Follows the same pattern as the sample Python client

  CLI Integration:
  - New --verify-data flag enables verification in the CLI client
  - Usage: stt-cli-client --verify-data

  Automatic Verification:
  - feed_audio() automatically enables verification when server_sent_to_stt is in metadata
  - record_and_send_audio() includes verification when enable_data_verification=True

  Usage Examples

  # CLI with verification enabled
  stt-cli-client --verify-data --language en

  # Python code with verification
  client = AudioToTextRecorderClient(
      enable_data_verification=True,
      language="en"
  )

  The implementation maintains full backward compatibility while adding robust data integrity verification capabilities that integrate seamlessly with the existing
  server verification system.
  Usage:
  # Test your current server
  python tests/test_quick_performance.py

  # Compare default vs optimized (manual server restart)
  python tests/test_quick_performance.py --compare

  2. tests/test_performance_benchmark.py - Comprehensive

  Usage:
  # Basic benchmark
  python tests/test_performance_benchmark.py

  # Test with concurrent requests
  python tests/test_performance_benchmark.py --concurrent 5

  # Auto-start optimized server and test
  python tests/test_performance_benchmark.py --config optimized --auto-server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants