Fix join audio sample rate #162

Blaizzy · 2025-05-17T21:56:20Z

Summary

save joined audio files using the model's sample rate

Testing

pytest -q (fails: command not found)

* base arch of server * add tts and stt endpoints * functioning server * connect server and ui * Add audio utilities, use them where possible (#161) * Add audio utilities, use them where possible. * Formatting. * Fix tests. * Fix tests. * More test fixes. * fix server * Fix join audio sample rate (#162) * update nextjs * fix stt view * working STT * working text to speech * remove voices * remove home * add custom model and delete file * refactor model mapping * add animation and use env vars for frontend config * remove unused * refactor model loading * add tests * mock generate * fix tests * remove old player * update readme --------- Co-authored-by: Lucas Newman <[email protected]>

* add ui v2 * Server v2 (#153) * base arch of server * add tts and stt endpoints * functioning server * connect server and ui * Add audio utilities, use them where possible (#161) * Add audio utilities, use them where possible. * Formatting. * Fix tests. * Fix tests. * More test fixes. * fix server * Fix join audio sample rate (#162) * update nextjs * fix stt view * working STT * working text to speech * remove voices * remove home * add custom model and delete file * refactor model mapping * add animation and use env vars for frontend config * remove unused * refactor model loading * add tests * mock generate * fix tests * remove old player * update readme --------- Co-authored-by: Lucas Newman <[email protected]> * add main and remove unused * add gitignore * update gitignore * fix: (outetts)loading model: Speaker file not found. (#189) Error loading model: Speaker file not found: mlx-audio/lib/python3.12/site-packages/mlx_audio/tts/models/outetts/default_speaker.json * Fix deprecated save in MLX-LM (#194) * fix deprecated save weights * update mlx-vlm * add pytest-asyncio * Implementation of Misaki G2P tokenizer (#193) * implementation of Misaki for on-device * KokoroTokenizer tests --------- Co-authored-by: Prince Canuma <[email protected]> * Add IndexTTS (#187) * correctly load model * got latent generation working * add ECAPA-TDNN and BigVGANConditioning ∙ * added sanitize method to oroginal bigvgan * renamed BigVGAN Activation1D.activation to .act * fixed various bugs * fit in existing model formats * add init test for IndexTTS * skip sanitize if already sanitized * masking logic + `tqdm.trange` + better default sampler * uses `WNConv1D` and `WNConvTranspose1d` * fix test & validate already converted in bigvgan * added normalizer * add normalizer dependencies * removed specifiers * fix tests * seems like wetext could just work fine on other platforms * removed wetext and added number normalizer for English --------- Co-authored-by: Prince Canuma <[email protected]> * Load both lexicon files us_gold and us_silver with words in us_gold taking precedence (#195) * implementation of Misaki for on-device * KokoroTokenizer tests * load both lexicon files with phonemes in us_gold taking precedence of us_silver --------- Co-authored-by: Prince Canuma <[email protected]> * Add S3 neural audio codec. (#204) * add lexicon files for British sounds, gb_gold and gb_silver (#197) * Fix Mimi codec. (#209) Co-authored-by: Prince Canuma <[email protected]> * Add ability to use a custom URL to load Kokoro safetensors (#185) * Add ability to use a custom URL to load Kokoro * Add ability to use a custom URL to load Orpheus * Handle error when loading kokoro weigths --------- Co-authored-by: Prince Canuma <[email protected]> * Handle transformers-style config for Sesame CSM models. (#211) * Add Xcode build troubleshooting documentation (#210) - Document Metal Toolchain error with Xcode Beta versions - Provide step-by-step solution for missing Metal Toolchain component - Include alternative build approaches and debug commands - Add comprehensive build command reference Fixes build failures when using Xcode Beta that lacks Metal Toolchain component required for mlx-swift Metal shader compilation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]> Co-authored-by: Prince Canuma <[email protected]> * Multi model support (#213) * feat: Add all possible languages selection and mapping for Kokoro TTS * Add support for all TTS models in web interface - Added /models endpoint to list available TTS models with configurations - Extended model dropdown to include Kokoro, CSM/Sesame, Bark, OuteTTS, and Spark - Added model-specific UI elements (reference audio upload for voice cloning) - Updated JavaScript for dynamic UI changes based on selected model capabilities - Enhanced server TTS endpoint to handle model-specific parameters - Support for reference audio in CSM/Sesame models for voice cloning - Dynamic language/voice options based on model capabilities * feat: Enhance TTS endpoint and web interface for model-specific parameters - Added support for all kokoro languages and voices - Added support for CSM/Sesame - Updated TTS endpoint to handle speed, pitch, and gender parameters for Spark models. - Modified audio player UI to dynamically load model types, languages, and voices based on selected model capabilities. - Added support for Spark-specific controls including speech speed, pitch, and gender selection. - Improved JavaScript logic for fetching models and updating UI elements accordingly. --------- Co-authored-by: Prince Canuma <[email protected]> * Add voxtral (#214) * add voxtral * use lm modules * add mistral commons * refactor generate * remove lazy * make lazy imports * bump version * Use indeterminate progress for CSM models (#216) * Use indeterminite progress for CSM models. * Fix multiple paragraph generation. * Bump version to 0.2.5 (#219) * fix wav2vec (#222) * Fix RTF calculation in kokoro model Reset start_time after each segment to ensure accurate real-time factor calculations for multi-segment audio generation. * fix: avoid unnecessary audio transcription for the index tts model Use the inspect.signature method to detect the ref_text parameter to avoid unnecessary transcription operations during text-to-speech conversion. * Add Sesame TTS components and configurations * Add Sesame TTS attention and model argument implementations * Add SesameModel implementation for dual-transformer TTS system * Implement Sesame TTS tokenizer and model wrapper * Add SesameVoiceManager for voice prompt management * Refactor VectorQuantization to use MLXNN.Linear * Refactor Mimi and SesameModelWrapper * Add Sesame TTS components for example * Update SesameWeightLoader to load weights * Update Sesame TTS with configuration loading * Update Sesame TTS with improved input handling and masking logic * Refactor SesameAttention and SesameTokenizer * Get the weights loading and RVQ working * Update to match python implementation * Refactor SesameModel and Mimi codec * Update Sesame TTS Swift * Update Sesame TTS with improved Mimi codec * Refactor Sesame TTS * Update ContentView to use Marvis TTS as the default provider * Refactor MarvisTTS by removing memory monitoring and debug logging * Refactor ContentView to integrate Sesame TTS as the default provider * Update Sesame TTS playback management with improved buffering * Update ContentView iOS to support Sesame * Update ContentView iOS to manage speaker selection * Refactor SesameTTS for better Swift naming conventions and add async streaming * Use SesameSession * Add Swift integration documentation * Clean up project by removing unused files and adding Xcode-related entries to .gitignore * Update SesameSession with optional audio playback * Fix Swift TTS import issues by updating Xcode project configuration - Removed problematic mlx-swift-audio package dependency from Xcode project - Added proper MLX package dependencies (MLX, MLXNN, MLXRandom, MLXLMCommon, MLXLLM, Transformers) - Updated workspace configuration to use parent directory - Modified SesameSession.swift to support optional audio playback Note: Swift Package Manager builds successfully, but Xcode has issues resolving MLX dependencies * Clean up Xcode package dependencies - remove redundant entries - Removed explicit mlx-swift and swift-transformers dependencies from Xcode - Keep only mlx-swift-examples which brings in both as transitive dependencies - This matches the Package.swift structure and avoids duplication Note: Xcode still has MLX dependency resolution issues, but Swift Package Manager works correctly * Update Xcode project configuration * Fix Xcode project build - remove redundant package dependencies - Removed duplicate mlx-swift and swift-transformers dependencies - Keep only mlx-swift-examples which provides both as transitive dependencies - Updated package version to upToNextMajorVersion with minimumVersion 2.25.7 - Xcode project now builds successfully ✅ Before: mlx-swift + mlx-swift-examples + swift-transformers (redundant) After: mlx-swift-examples (includes mlx-swift + swift-transformers) * Add audio playback files to Xcode project * Use batched vocoding to reduce peak memory usage with Sesame arch models. (#236) * Cache RoPE by dtype for Sesame arch models for improved generation performance. (#232) * Install Metal toolchain for Swift tests. (#233) * Adopt changes interface changes from mlx-lm to fix Sesame-arch models. (#242) Co-authored-by: Prince Canuma <[email protected]> * Update package dependencies and formatting (#247) * Improve Swift TTS app UX (#248) * Refactor SesameSession initializers and update ContentView for improved TTS provider handling * Refactor ContentView and update TTS * Update ContentView * Integrate Marvis TTS model into Swift * Update macOS generate button with loading indicators and stop functionality - Add progress indicators when generating - Show 'Loading...' vs 'Generating...' states - Add Stop button with proper disable states - Use @StateObject for KokoroTTSModel to observe changes - Match iOS button styling and behavior - Update title to 'MLX Audio Eval' and remove mouth icon * Update package dependencies in Package.swift and Package.resolved * Refactor ContentView and introduce new inspector components for TTS * Add audio playback management * Add Marvis session status indicator into TTS views * Refactor VoicePickerSection and AudioPlayerView for improved layout and functionality * Update ContentView with inspector * Update README and ContentView * Swift_TTS to MLXAudio * Update Package.swift --------- Co-authored-by: Prince Canuma <[email protected]> * Add quality selection and streaming controls to Marvis with UI support for macOS & iOS (#249) * Add quality selection feature * Refactor QualityLevel enum * Add streaming audio generation option * Add streaming interval configuration * Refactor Marvis session management * Refactor unused variables (#250) Co-authored-by: Prince Canuma <[email protected]> * Refactor MarvisModel to handle optional backbone and decoder flavors (#251) * Refactor MarvisModel to handle optional backbone and decoder flavors * Add 6-bit model support and quantization handling - Updated default model to marvis-tts-100m-v0.2-MLX-6bit - Fixed quantization config to handle JSONValue types (supports mixed types like mode string and bits/group_size numbers) - Updated installWeights to properly extract quantization parameters from JSONValue enum * Refactor MarvisSession to improve quantization handling and update default model * Fix iOS 16 compatibility and ESpeakNG framework linking for iOS app (#252) * Refactor ContentView and ESpeakNGEngine * Update iOS platform version to v17 in Package.swift * Refactor UI components to use platform-specific colors * Update button styles for macOS and iOS in TextInputSection.swift * Refactor onChange handlers * Add testable reference and build action for MLXAudioTests in Xcode scheme * Add SpeakNG.xcframework in Embed Frameworks phase of MLXAudio target --------- Co-authored-by: Prince Canuma <[email protected]> * Add memory increase limit for iOS (#253) * Refactor ContentView and ESpeakNGEngine * Update iOS platform version to v17 in Package.swift * Refactor UI components to use platform-specific colors * Update button styles for macOS and iOS in TextInputSection.swift * Refactor onChange handlers * Add testable reference and build action for MLXAudioTests in Xcode scheme * Add SpeakNG.xcframework in Embed Frameworks phase of MLXAudio target * Add entitlements file for increased memory limit --------- Co-authored-by: Prince Canuma <[email protected]> * Update audio playback management in Marvis TTS (#254) * Bump version and add new copy files (#255) * bump version * add wav and text to copy * add text to copy pattern * fix tests * Server v2 (#153) * base arch of server * add tts and stt endpoints * functioning server * connect server and ui * Add audio utilities, use them where possible (#161) * Add audio utilities, use them where possible. * Formatting. * Fix tests. * Fix tests. * More test fixes. * fix server * Fix join audio sample rate (#162) * update nextjs * fix stt view * working STT * working text to speech * remove voices * remove home * add custom model and delete file * refactor model mapping * add animation and use env vars for frontend config * remove unused * refactor model loading * add tests * mock generate * fix tests * remove old player * update readme --------- Co-authored-by: Lucas Newman <[email protected]> * add main and remove unused * set marvis as default * format --------- Co-authored-by: Lucas Newman <[email protected]> Co-authored-by: sam <[email protected]> Co-authored-by: Sachin Desai <[email protected]> Co-authored-by: Senstella <[email protected]> Co-authored-by: Adrien Grondin <[email protected]> Co-authored-by: Kyle Kinkade <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: Ivan Fioravanti <[email protected]> Co-authored-by: Josh Bleecher Snyder <[email protected]> Co-authored-by: David Feng <[email protected]> Co-authored-by: bytefer <[email protected]> Co-authored-by: Rudrank Riyam <[email protected]> Co-authored-by: Liam Wittig <[email protected]>

Fix join audio sample rate

71ac288

Blaizzy added the codex label May 17, 2025 — with ChatGPT Codex Connector

Merge branch 'main' into codex/fix-sample-rate-bug-in-generate-file

1cccd3d

Blaizzy merged commit c45f399 into main May 17, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix join audio sample rate #162

Fix join audio sample rate #162

Uh oh!

Blaizzy commented May 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix join audio sample rate #162

Fix join audio sample rate #162

Uh oh!

Conversation

Blaizzy commented May 17, 2025

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants