Skip to content

Commit d9b370c

Browse files
authored
Merge pull request #379 from Huanshere/i18n
I18n
2 parents 43033b9 + 74e6379 commit d9b370c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2021
-1045
lines changed

README.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,11 @@
44

55
# Connect the World, Frame by Frame
66

7-
[Website](https://videolingo.io) | [Documentation](https://docs.videolingo.io/docs/start)
8-
9-
[**English**](/README.md)[**中文**](/i18n/README.zh.md)
7+
[**English**](/README.md)[**简体中文**](/translations/README.zh.md)[**繁體中文**](/translations/README.zh-TW.md)[**日本語**](/translations/README.ja.md)[**Español**](/translations/README.es.md)[**Русский**](/translations/README.ru.md)[**Français**](/translations/README.fr.md)
108

119
</div>
1210

13-
## 🌟 Overview ([Try VideoLingo For Free!](https://videolingo.io))
11+
## 🌟 Overview ([Try VL Now!](https://videolingo.io))
1412

1513
VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.
1614

@@ -31,6 +29,8 @@ Key features:
3129

3230
- 🚀 One-click startup and processing in Streamlit
3331

32+
- 🌍 Multi-language support in Streamlit UI
33+
3434
- 📝 Detailed logging with progress resumption
3535

3636
Difference from similar projects: **Single-line subtitles only, superior translation quality, seamless dubbing experience**
@@ -68,7 +68,9 @@ https://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c
6868

6969
## Installation
7070

71-
> **Note:** To use NVIDIA GPU acceleration on Windows, please complete the following steps first:
71+
You don't have to read the whole docs, [**here**](https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh) is an online AI agent to help you.
72+
73+
> **Note:** For Windows users with NVIDIA GPU, follow these steps before installation:
7274
> 1. Install [CUDA Toolkit 12.6](https://developer.download.nvidia.com/compute/cuda/12.6.0/local_installers/cuda_12.6.0_560.76_windows.exe)
7375
> 2. Install [CUDNN 9.3.0](https://developer.download.nvidia.com/compute/cudnn/9.3.0/local_installers/cudnn_9.3.0_windows.exe)
7476
> 3. Add `C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6` to your system PATH
@@ -77,7 +79,7 @@ https://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c
7779
> **Note:** FFmpeg is required. Please install it via package managers:
7880
> - Windows: ```choco install ffmpeg``` (via [Chocolatey](https://chocolatey.org/))
7981
> - macOS: ```brew install ffmpeg``` (via [Homebrew](https://brew.sh/))
80-
> - Linux: ```sudo apt install ffmpeg``` (Debian/Ubuntu) or ```sudo dnf install ffmpeg``` (Fedora)
82+
> - Linux: ```sudo apt install ffmpeg``` (Debian/Ubuntu)
8183
8284
1. Clone the repository
8385

@@ -108,12 +110,13 @@ docker build -t videolingo .
108110
docker run -d -p 8501:8501 --gpus all videolingo
109111
```
110112

111-
## API
112-
VideoLingo supports OpenAI-Like API format and various dubbing interfaces:
113-
- `claude-3-5-sonnet-20240620`, **`gemini-2.0-flash-exp`**, `gpt-4o`, `deepseek-coder`, ... (sorted by performance)
114-
- `azure-tts`, `openai-tts`, `siliconflow-fishtts`, **`fish-tts`**, `GPT-SoVITS`, `edge-tts`, `*custom-tts`(ask gpt to help you define in custom_tts.py)
113+
## APIs
114+
VideoLingo supports OpenAI-Like API format and various TTS interfaces:
115+
- LLM: `claude-3-5-sonnet-20240620`, `deepseek-chat(v3)`, `gemini-2.0-flash-exp`, `gpt-4o`, ... (sorted by performance)
116+
- WhisperX: Run whisperX locally or use 302.ai API
117+
- TTS: `azure-tts`, `openai-tts`, `siliconflow-fishtts`, **`fish-tts`**, `GPT-SoVITS`, `edge-tts`, `*custom-tts`(You can modify your own TTS in custom_tts.py!)
115118

116-
> **Note:** VideoLingo is now integrated with [302.ai](https://gpt302.saaslink.net/C2oHR9), **one API KEY** for both LLM and TTS! Also supports fully local deployment using Ollama for LLM and Edge-TTS for dubbing, no cloud API required!
119+
> **Note:** VideoLingo works with **[302.ai](https://gpt302.saaslink.net/C2oHR9)** - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!
117120
118121
For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: [English](/docs/pages/docs/start.en-US.md) | [中文](/docs/pages/docs/start.zh-CN.md)
119122

@@ -135,11 +138,10 @@ This project is licensed under the Apache 2.0 License. Special thanks to the fol
135138

136139
[whisperX](https://github.com/m-bain/whisperX), [yt-dlp](https://github.com/yt-dlp/yt-dlp), [json_repair](https://github.com/mangiucugna/json_repair), [BELLE](https://github.com/LianjiaTech/BELLE)
137140

138-
## 📬 Contact Us
141+
## 📬 Contact Me
139142

140-
- Join our Discord: https://discord.gg/9F2G92CWPp
141143
- Submit [Issues](https://github.com/Huanshere/VideoLingo/issues) or [Pull Requests](https://github.com/Huanshere/VideoLingo/pulls) on GitHub
142-
- Follow me on Twitter: [@Huanshere](https://twitter.com/Huanshere)
144+
- DM me on Twitter: [@Huanshere](https://twitter.com/Huanshere)
143145
- Email me at: [email protected]
144146

145147
## ⭐ Star History
@@ -148,4 +150,4 @@ This project is licensed under the Apache 2.0 License. Special thanks to the fol
148150

149151
---
150152

151-
<p align="center">If you find VideoLingo helpful, please give us a ⭐️!</p>
153+
<p align="center">If you find VideoLingo helpful, please give me a ⭐️!</p>

config.yaml

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,33 @@
11
# * Settings marked with * are advanced settings that won't appear in the Streamlit page and can only be modified manually in config.py
2-
version: "2.1.2"
2+
version: "2.2.0"
33
## ======================== Basic Settings ======================== ##
4+
display_language: "zh-CN"
5+
46
# API settings
57
api:
68
key: 'YOUR_API_KEY'
79
base_url: 'https://api.302.ai'
8-
model: 'gemini-2.0-flash-exp'
10+
model: 'deepseek-chat'
911

1012
# Language settings, written into the prompt, can be described in natural language
1113
target_language: '简体中文'
1214

1315
# Whether to use Demucs for vocal separation before transcription
14-
demucs: false
16+
demucs: true
1517

1618
whisper:
1719
# ["medium", "large-v3", "large-v3-turbo"]. Note: for zh model will force to use Belle/large-v3
1820
model: 'large-v3'
1921
# Whisper specified recognition language [en, zh, ...]
2022
language: 'en'
2123
detected_language: 'en'
24+
# Whisper running mode ["local", "cloud"]. Specifies where to run, cloud uses 302.ai API
25+
runtime: 'cloud'
26+
# 302.ai API key
27+
whisperX_302_api_key: 'YOUR_302_API_KEY'
2228

23-
# Video resolution [0x0, 640x360, 1920x1080] 0x0 will generate a 0-second black video placeholder
24-
resolution: '1920x1080'
29+
# Whether to burn subtitles into the video
30+
burn_subtitles: true
2531

2632
## ======================== Advanced Settings ======================== ##
2733
# *Default resolution for downloading YouTube videos [360, 1080, best]
@@ -33,7 +39,7 @@ subtitle:
3339
# *Translated subtitles are slightly larger than source subtitles, affecting the reference length for subtitle splitting
3440
target_multiplier: 1.2
3541

36-
# * Summary length, set low to 2k if using local LLM
42+
# *Summary length, set low to 2k if using local LLM
3743
summary_length: 8000
3844

3945
# *Number of LLM multi-threaded accesses, set to 1 if using local LLM
@@ -135,6 +141,7 @@ llm_support_json:
135141
- 'gpt-4o-mini'
136142
- 'gemini-2.0-flash-exp'
137143
- 'deepseek-coder'
144+
- 'deepseek-chat'
138145

139146
# have problems
140147
# - 'Qwen/Qwen2.5-72B-Instruct'

core/all_tts_functions/siliconflow_fish_tts.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
sys.path.append(os.path.join(os.path.dirname(__file__), "..", ".."))
88
from core.config_utils import load_key, update_key
99
from core.step1_ytdlp import find_video_files
10-
from core.all_whisper_methods.whisperX_utils import get_audio_duration
10+
from core.all_whisper_methods.audio_preprocess import get_audio_duration
1111
import hashlib
1212
from rich import print as rprint
1313
from pydub import AudioSegment

core/all_tts_functions/tts_main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
sys.path.append(os.path.join(os.path.dirname(__file__), "..", ".."))
77
from core.config_utils import load_key
8-
from core.all_whisper_methods.whisperX_utils import get_audio_duration
8+
from core.all_whisper_methods.audio_preprocess import get_audio_duration
99
from core.all_tts_functions.gpt_sovits_tts import gpt_sovits_tts_for_videolingo
1010
from core.all_tts_functions.siliconflow_fish_tts import siliconflow_fish_tts_for_videolingo
1111
from core.all_tts_functions.openai_tts import openai_tts
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
import requests
2+
import sys, os
3+
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
4+
from core.config_utils import load_key
5+
from rich import print as rprint
6+
import time
7+
import json
8+
import tempfile
9+
import subprocess
10+
11+
OUTPUT_LOG_DIR = "output/log"
12+
def transcribe_audio_302(audio_path: str, start: float = None, end: float = None):
13+
os.makedirs(OUTPUT_LOG_DIR, exist_ok=True)
14+
LOG_FILE = f"{OUTPUT_LOG_DIR}/whisperx302.json"
15+
if os.path.exists(LOG_FILE):
16+
with open(LOG_FILE, "r", encoding="utf-8") as f:
17+
return json.load(f)
18+
19+
WHISPER_LANGUAGE = load_key("whisper.language")
20+
url = "https://api.302.ai/302/whisperx"
21+
22+
# 如果指定了开始和结束时间,创建临时音频片段
23+
if start is not None and end is not None:
24+
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_audio:
25+
temp_audio_path = temp_audio.name
26+
27+
# 使用ffmpeg截取音频片段
28+
ffmpeg_cmd = f'ffmpeg -y -i "{audio_path}" -ss {start} -t {end-start} -vn -ar 32000 -ac 1 "{temp_audio_path}"'
29+
subprocess.run(ffmpeg_cmd, shell=True, check=True, capture_output=True)
30+
audio_path = temp_audio_path
31+
32+
payload = {
33+
"processing_type": "align",
34+
"language": WHISPER_LANGUAGE,
35+
"output": "raw"
36+
}
37+
38+
start_time = time.time()
39+
rprint(f"[cyan]🎤 Transcribing audio with language: <{WHISPER_LANGUAGE}> ...[/cyan]")
40+
files = [
41+
('audio_input',(
42+
os.path.basename(audio_path),
43+
open(audio_path, 'rb'),
44+
'application/octet-stream'
45+
))
46+
]
47+
48+
headers = {
49+
'Authorization': f'Bearer {load_key("whisper.whisperX_302_api_key")}'
50+
}
51+
52+
response = requests.request("POST", url, headers=headers, data=payload, files=files)
53+
54+
# 清理临时文件
55+
if start is not None and end is not None:
56+
if os.path.exists(temp_audio_path):
57+
os.unlink(temp_audio_path)
58+
59+
with open(LOG_FILE, "w", encoding="utf-8") as f:
60+
json.dump(response.json(), f, indent=4, ensure_ascii=False)
61+
62+
# 调整时间戳
63+
if start is not None:
64+
result = response.json()
65+
for segment in result['segments']:
66+
segment['start'] += start
67+
segment['end'] += start
68+
for word in segment.get('words', []):
69+
if 'start' in word:
70+
word['start'] += start
71+
if 'end' in word:
72+
word['end'] += start
73+
response._content = json.dumps(result).encode()
74+
75+
elapsed_time = time.time() - start_time
76+
rprint(f"[green]✓ Transcription completed in {elapsed_time:.2f} seconds[/green]")
77+
return response.json()
78+
79+
if __name__ == "__main__":
80+
# 使用示例:
81+
result = transcribe_audio_302("output/audio/raw.mp3")
82+
rprint(result)
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
import os,sys
2+
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
3+
import warnings
4+
warnings.filterwarnings("ignore")
5+
6+
import whisperx
7+
import torch
8+
import time
9+
import subprocess
10+
from typing import Dict
11+
from rich import print as rprint
12+
import librosa
13+
import tempfile
14+
from core.config_utils import load_key
15+
from core.all_whisper_methods.audio_preprocess import save_language
16+
17+
MODEL_DIR = load_key("model_dir")
18+
19+
def check_hf_mirror() -> str:
20+
"""Check and return the fastest HF mirror"""
21+
mirrors = {
22+
'Official': 'huggingface.co',
23+
'Mirror': 'hf-mirror.com'
24+
}
25+
fastest_url = f"https://{mirrors['Official']}"
26+
best_time = float('inf')
27+
rprint("[cyan]🔍 Checking HuggingFace mirrors...[/cyan]")
28+
for name, domain in mirrors.items():
29+
try:
30+
if os.name == 'nt':
31+
cmd = ['ping', '-n', '1', '-w', '3000', domain]
32+
else:
33+
cmd = ['ping', '-c', '1', '-W', '3', domain]
34+
start = time.time()
35+
result = subprocess.run(cmd, capture_output=True, text=True)
36+
response_time = time.time() - start
37+
if result.returncode == 0:
38+
if response_time < best_time:
39+
best_time = response_time
40+
fastest_url = f"https://{domain}"
41+
rprint(f"[green]✓ {name}:[/green] {response_time:.2f}s")
42+
except:
43+
rprint(f"[red]✗ {name}:[/red] Failed to connect")
44+
if best_time == float('inf'):
45+
rprint("[yellow]⚠️ All mirrors failed, using default[/yellow]")
46+
rprint(f"[cyan]🚀 Selected mirror:[/cyan] {fastest_url} ({best_time:.2f}s)")
47+
return fastest_url
48+
49+
def transcribe_audio(audio_file: str, start: float, end: float) -> Dict:
50+
os.environ['HF_ENDPOINT'] = check_hf_mirror() #? don't know if it's working...
51+
WHISPER_LANGUAGE = load_key("whisper.language")
52+
device = "cuda" if torch.cuda.is_available() else "cpu"
53+
rprint(f"🚀 Starting WhisperX using device: {device} ...")
54+
55+
if device == "cuda":
56+
gpu_mem = torch.cuda.get_device_properties(0).total_memory / (1024**3)
57+
batch_size = 16 if gpu_mem > 8 else 2
58+
compute_type = "float16" if torch.cuda.is_bf16_supported() else "int8"
59+
rprint(f"[cyan]🎮 GPU memory:[/cyan] {gpu_mem:.2f} GB, [cyan]📦 Batch size:[/cyan] {batch_size}, [cyan]⚙️ Compute type:[/cyan] {compute_type}")
60+
else:
61+
batch_size = 1
62+
compute_type = "int8"
63+
rprint(f"[cyan]📦 Batch size:[/cyan] {batch_size}, [cyan]⚙️ Compute type:[/cyan] {compute_type}")
64+
rprint(f"[green]▶️ Starting WhisperX for segment {start:.2f}s to {end:.2f}s...[/green]")
65+
66+
try:
67+
if WHISPER_LANGUAGE == 'zh':
68+
model_name = "Huan69/Belle-whisper-large-v3-zh-punct-fasterwhisper"
69+
local_model = os.path.join(MODEL_DIR, "Belle-whisper-large-v3-zh-punct-fasterwhisper")
70+
else:
71+
model_name = load_key("whisper.model")
72+
local_model = os.path.join(MODEL_DIR, model_name)
73+
74+
if os.path.exists(local_model):
75+
rprint(f"[green]📥 Loading local WHISPER model:[/green] {local_model} ...")
76+
model_name = local_model
77+
else:
78+
rprint(f"[green]📥 Using WHISPER model from HuggingFace:[/green] {model_name} ...")
79+
80+
vad_options = {"vad_onset": 0.500,"vad_offset": 0.363}
81+
asr_options = {"temperatures": [0],"initial_prompt": "",}
82+
whisper_language = None if 'auto' in WHISPER_LANGUAGE else WHISPER_LANGUAGE
83+
rprint("[bold yellow]**You can ignore warning of `Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118...`**[/bold yellow]")
84+
model = whisperx.load_model(model_name, device, compute_type=compute_type, language=whisper_language, vad_options=vad_options, asr_options=asr_options, download_root=MODEL_DIR)
85+
86+
# Create temp file with wav format for better compatibility
87+
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_audio:
88+
temp_audio_path = temp_audio.name
89+
90+
# Extract audio segment using ffmpeg
91+
ffmpeg_cmd = f'ffmpeg -y -i "{audio_file}" -ss {start} -t {end-start} -vn -ar 32000 -ac 1 "{temp_audio_path}"'
92+
subprocess.run(ffmpeg_cmd, shell=True, check=True, capture_output=True)
93+
94+
try:
95+
# Load audio segment with librosa
96+
audio_segment, sample_rate = librosa.load(temp_audio_path, sr=16000)
97+
finally:
98+
# Clean up temp file
99+
if os.path.exists(temp_audio_path):
100+
os.unlink(temp_audio_path)
101+
102+
rprint("[bold green]note: You will see Progress if working correctly[/bold green]")
103+
result = model.transcribe(audio_segment, batch_size=batch_size, print_progress=True)
104+
105+
# Free GPU resources
106+
del model
107+
torch.cuda.empty_cache()
108+
109+
# Save language
110+
save_language(result['language'])
111+
if result['language'] == 'zh' and WHISPER_LANGUAGE != 'zh':
112+
raise ValueError("Please specify the transcription language as zh and try again!")
113+
114+
# Align whisper output
115+
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
116+
result = whisperx.align(result["segments"], model_a, metadata, audio_segment, device, return_char_alignments=False)
117+
118+
# Free GPU resources again
119+
torch.cuda.empty_cache()
120+
del model_a
121+
122+
# Adjust timestamps
123+
for segment in result['segments']:
124+
segment['start'] += start
125+
segment['end'] += start
126+
for word in segment['words']:
127+
if 'start' in word:
128+
word['start'] += start
129+
if 'end' in word:
130+
word['end'] += start
131+
return result
132+
except Exception as e:
133+
rprint(f"[red]WhisperX processing error:[/red] {e}")
134+
raise

core/step10_gen_audio.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
1616
from core.config_utils import load_key
17-
from core.all_whisper_methods.whisperX_utils import get_audio_duration
17+
from core.all_whisper_methods.audio_preprocess import get_audio_duration
1818
from core.all_tts_functions.tts_main import tts_main
1919

2020
console = Console()

0 commit comments

Comments
 (0)