Releases: Huanshere/VideoLingo
Releases · Huanshere/VideoLingo
v3.0.0
Changelog - Version 3.0.0
This update primarily focuses on code refactoring, resulting in a more streamlined and readable codebase, along with improvements to the prompt and fixes for several stability issues.
🚀 New Features
- Significantly improved transcription quality: now performs ASR on the original audio, then uses Demucs-denoised audio for force alignment, greatly reducing missed sentences.
- Added support for WhisperX 302 Cloud API (recommended for users without local GPUs or who prefer not to deal with complex installation), and preliminary support for 11labs Scribe model (still in development—stability may be lower than WhisperX, use with caution).
- Enhanced segmentation stability through longer chain-of-thought reasoning.
- Improved translation prompt to optimize overly concise translations.
- Added a JSON format support button to the sidebar LLM settings.
🐛 Bug Fixes
- Increased the word deletion threshold from 20 to 30, fixing the issue of incorrectly deleting valid words.
- Fixed errors when processing longer audio/text segments, resolving WhisperX cloud audio segmentation issues.
- Implemented stricter validation of LLM response formats, fixing translation line alignment errors.
🔧 Improvements
- Refactored the project architecture, making the code more streamlined, clear, and easier to maintain.
- Removed automatic FFmpeg GPU acceleration detection, now requiring manual configuration to enhance reliability.
- Implemented pydub for more reliable audio segmentation.
更新日志 - 版本 3.0.0
本次更新主要是对代码进行了重构,现在更加精简易读,并且改进了prompt,修复了一些稳定性问题。
🚀 新功能
- 显著提升了转录质量:现在先对原声进行ASR,再用Demucs降噪后的音频做强制对齐,极大改善了漏句问题。
- 增加对 WhisperX 302 Cloud API 的支持(推荐给本地没有GPU或不想复杂安装的用户),以及初步支持 11labs Scribe 模型(仍在开发中,稳定性似乎不如WhisperX,谨慎使用)。
- 通过更长的链式推理提升分割稳定性。
- 改进翻译prompt,优化过于简洁的翻译结果。
- 侧边栏 LLM 设置中新增 JSON 格式支持按钮。
🐛 问题修复
- 将词语删除阈值从 20 提高到 30,修复误删正常词语的问题。
- 修复处理更长音频/文本片段时的错误,解决 WhisperX cloud 音频分割问题。
- 对 LLM 响应格式进行了更严格的校验,修复了翻译行数对齐错误。
🔧 改进
- 重构了项目架构,使代码更加精简清晰,结构更简单易维护。
- 移除 FFmpeg GPU 加速自动检测,现需手动在配置中设置,以提升可靠性。
- 使用 pydub 实现更可靠的音频分割。
v2.2.1
Release Notes
🚀 New Features:
- Integrated the SiliconFlow CosyVoice2.0 0.5B API for few-shot voice cloning, delivering impressive results (though occasionally unstable), and updated the demo in the README to reflect this new integration.
🐛 Bug Fixes:
- Fixed a minor issue with file saving in the whisperX 302 API.
🚀 新功能:
- 接入了 SiliconFlow 的 CosyVoice2.0 0.5B API,用于少样本语音克隆,效果非常惊艳(尽管偶尔不稳定),并更新了 README 中的演示以反映这一新功能。
🐛 问题修复:
- 修复了 whisperX 302 API 保存文件时的小问题。
VideoLingo with Cosyvoice2.0 demo:
https://github.com/user-attachments/assets/ea8b1cb4-1666-46d4-9e0b-ca091e34888c
v2.2.0
Release Notes
🚀 New Features:
- Added support for the
WhisperX 302.ai
API, allowing VideoLingo to run without local computational resources. - Added internationalization (i18n) support for Streamlit UI and
install.py
, now available in Chinese, English, Traditional Chinese, Japanese, French, Russian, and Spanish.
🔧 Improvements:
- Enhanced the
install.py
process for a smoother user experience. - Placed the AI Assistant in a more prominent position to provide better support for installation and usage issues.
- Updated the icons to be more visually appealing and slightly optimized the documentation structure.
🎉 Special Notes:
- Snake Year Greetings! Wishing everyone a fantastic 2025, and may we continue to evolve alongside AI!
- Exciting Updates on Reasoning Models: The latest reasoning models are performing exceptionally well! We’re working on integrating them soon—stay tuned!
发布日志
🚀 新功能:
- 加入了
WhisperX 302.ai
的 API,现在可以不需要本地算力就能运行 VideoLingo 了! - Streamlit UI 和
install.py
支持中文、英文、繁体中文、日文、法文、俄文和西班牙文。
🔧 改进:
- 改进了
install.py
的流程,现在使用更加流畅了。 - 把 AI Assistant 放在了更明显的位置,如果安装和使用遇到问题可以多多使用。
- 使用了更好看的图标,稍微优化了文档结构。
🎉 特别提示:
- 蛇年快乐! 祝大家2025年一切顺利,愿我们与AI一起继续进化!
- 推理模型最新进展: 最近的推理模型表现非常出色!我们正在尽快尝试接入,敬请期待!
v2.1.2
v2.1.1
Release Notes
🐛 Bug Fixes:
- Use empty audio as a fallback for failed TTS tasks and handle empty lines in TTS tasks generation.
- Handle multiple spaces when merging words and allow multiple splits in second segmentation.
- Add support for Grok beta and resolve compatibility issues in the
askgpt
function. - Pin ctranslate2 to version 4.5.0 to avoid errors.
发布说明
🐛 问题修复:
- 为失败的 TTS 任务使用空音频作为回退,并处理 TTS 任务生成中的空行。
- 在合并单词时处理多个空格,并允许在第二次分段中进行多次分割。
- 支持 Grok beta 并解决
askgpt
函数中的兼容性问题。 - 将 ctranslate2 固定为 4.5.0 版本,以避免报错。
v2.1.0
Release Notes
🚀 New Features:
- Added support for custom terms.
- Added a custom TTS setting.
- Added support for Deepseekcoder.
- Added support for pure local operation using Ollama and Edge-TTS.
- Unified TTS methods with 302ai integration, now requiring only one 302ai key to experience the full functionality.
🐛 Bug Fixes:
- Fixed the NVIDIA GPU check.
- Added a check for
[br]
in the response splitting step. - Avoided errors by not checking the source audio bit rate.
🔧 Improvements:
- Removed automatic FFmpeg installation, now requiring a system-level install.
🚀 新功能:
- 增加了自定义术语的支持。
- 增加了自定义 TTS 设置。
- 增加了对 Deepseekcoder 的支持。
- 增加了对纯本地运行的支持,使用 Ollama 和 Edge-TTS。
- 与 302ai 集成统一了 TTS 方法,现在只需要一个 302ai 密钥就能体验完整功能。
🐛 问题修复:
- 修复了 NVIDIA GPU 检查。
- 在响应拆分步骤中增加了对
[br]
的检查。 - 通过不检查源音频比特率避免了错误。
🔧 改进:
- 移除了自动 FFmpeg 安装,现在需要系统级安装。
v2.0.4
Release Notes
🚀 New Features:
- Added Demucs configuration in the Streamlit sidebar and increased human voice volume after Demucs processing, recommended for videos with loud background music.
- Added memory cleanup after Demucs and improved Demucs audio quality.
🐛 Bug Fixes:
- Reduced subtitle font size to prevent double-line subtitles.
- Fixed the "All same length" error. Note: This issue may still occur with smaller models or reverse API usage.
- Added handling for exceeding dubbing length limits, now truncating instead of throwing an error.
🔧 Improvements:
- Optimized prompt splitting to handle repeated text.
- Improved language style and removed the concise requirement from translation prompts.
- Updated
install.py
to rollback to the v2.0 installation method, removing local third-party installation to avoid errors. - Installation now directly installs whisperX and Demucs from Git, avoiding dependency conflicts.
🚀 新功能:
- 在 Streamlit 侧边栏中添加了 Demucs 配置,并在 Demucs 处理后增加了人声音量,适用于背景音乐较大的视频。
- 在 Demucs 处理后添加了内存清理,并提升了 Demucs 音质。
🐛 问题修复:
- 减小了字幕字体大小,防止出现双行字幕。
- 修复了 "All same length" 错误。注意:使用较小的模型或逆向接口时仍可能报错。
- 增加了配音长度超限的处理,现在会截断而不是抛出错误。
🔧 改进:
- 优化了 prompt 分割,处理重复文本。
- 优化了翻译 prompt 的语言风格并移除了简洁要求。
- 更新了
install.py
,回滚到 v2.0 安装方法,移除了本地第三方安装,避免错误。 - 安装现在直接从 Git 安装 whisperX 和 Demucs,避免了依赖冲突。
v2.0.3
Release Notes
December 2, 2024
🐛 Bug Fixes:
- Fixed an alignment error in the voice generation task.
- Resolved an audio reading issue during the whisper process.
- Loosened the audio speed change error tolerance to reduce errors.
- Applied stricter JSON constraints.
🔧 Improvements:
- Added memory cleanup in batch mode to improve performance.
2024年12月2日
🐛 问题修复:
- 修复了生成配音任务的对齐错误。
- 解决了 whisper 过程中的读取音频问题。
- 放宽了音频速度变化的误差容限,以减少错误。
- 应用了更严格的 JSON 约束。
🔧 改进:
- 在批量模式中添加了内存清理,以提高性能。
v2.0.2-deprecated
Release Notes
🐛 Bug Fixes:
- Removed the one-click installation script due to instability on some computers.
- Removed conda installation of FFmpeg due to issues on Windows.
- Fixed issues with gptsovits English support.
- Reverted the local installation of FFmpeg.
🔧 Improvements:
- Set the audio bitrate to 32000 to improve recognition accuracy.
🐛 问题修复:
- 移除了一键安装脚本,因为在某些电脑上表现不稳定。
- 移除了 conda 安装 FFmpeg,因为在 Windows 上无法正常使用。
- 修复了 gptsovits 英文支持的问题。
- 回滚了 FFmpeg 的本地安装。
🔧 改进:
- 将音频比特率设置为 32000 以提高识别精度。
v2.0.1-deprecated
🔧 Improvements:
- Simplify installation process: Simplified the installation process. Now, Windows users no longer need to manually install CUDA, and the installation of FFmpeg and whisperX has been streamlined.
- Optimized minor issues: Various small issues have been optimized for better performance and user experience.
🔧 改进:
- 简化安装过程:简化了安装过程。现在,Windows 用户不再需要手动安装 CUDA,FFmpeg 和 whisperX 的安装也进行了简化。
- 优化了一些小问题:对一些小问题进行了优化,以提高性能和用户体验。