-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[demo] speech web demo #2039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[demo] speech web demo #2039
Changes from 9 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
c7dd207
fixs CORS Error
iftaken 450cd98
add PP-TTS,PP-ASR,PP-VPR
iftaken 2938d3e
Merge branch 'develop' into dev-hym
iftaken 474373b
Merge branch 'develop' into dev-hym
iftaken e68f1ce
add speech web demo
iftaken cb50999
Merge branch 'develop' of github.com:PaddlePaddle/PaddleSpeech into d…
iftaken 729fe6a
rename speech_web
iftaken 2b6fab3
rm TTS.vue
iftaken 30d4304
update demo show png
iftaken 357b177
rename readme and fixed conflict
iftaken 80adf54
remove error url for paddlepaddle
iftaken 63ad046
del dead link
iftaken File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
*/.vscode/* | ||
*.wav | ||
*/resource/* | ||
.Ds* | ||
*.pyc | ||
*.pcm | ||
*.npy | ||
*.diff | ||
*.sqlite | ||
*/static/* | ||
*.pdparams | ||
*.pdiparams* | ||
*.pdmodel | ||
*/source/* | ||
*/PaddleSpeech/* | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
# Paddle Speech Demo | ||
|
||
PaddleSpeechDemo是一个以PaddleSpeech的语音交互功能为主体开发的Demo展示项目,用于帮助大家更好的上手PaddleSpeech以及使用PaddleSpeech构建自己的应用。 | ||
|
||
智能语音交互部分使用PaddleSpeech,对话以及信息抽取部分使用PaddleNLP,网页前端展示部分基于Vue3进行开发 | ||
|
||
主要功能: | ||
|
||
+ 语音聊天:PaddleSpeech的语音识别能力+语音合成能力,对话部分基于PaddleNLP的闲聊功能 | ||
+ 声纹识别:PaddleSpeech的声纹识别功能展示 | ||
+ 语音识别:支持【实时语音识别】,【端到端识别】,【音频文件识别】三种模式 | ||
+ 语音合成:支持【流式合成】与【端到端合成】两种方式 | ||
+ 语音指令:基于PaddleSpeech的语音识别能力与PaddleNLP的信息抽取,实现交通费的智能报销 | ||
|
||
运行效果: | ||
|
||
 | ||
zh794390558 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## 安装 | ||
|
||
### 后端环境安装 | ||
|
||
``` | ||
# 安装环境 | ||
cd speech_server | ||
pip install -r requirements.txt | ||
``` | ||
|
||
|
||
### 前端环境安装 | ||
|
||
前端依赖node.js ,需要提前安装,确保npm可用,npm测试版本8.3.1,建议下载[官网](https://nodejs.org/en/)稳定版的node.js | ||
|
||
``` | ||
# 进入前端目录 | ||
cd web_client | ||
|
||
# 安装yarn,已经安装可跳过 | ||
npm install -g yarn | ||
|
||
# 使用yarn安装前端依赖 | ||
yarn install | ||
``` | ||
|
||
|
||
## 启动服务 | ||
|
||
### 开启后端服务 | ||
|
||
``` | ||
cd speech_server | ||
# 默认8010端口 | ||
python main.py --port 8010 | ||
``` | ||
|
||
### 开启前端服务 | ||
|
||
``` | ||
cd web_client | ||
yarn dev --port 8011 | ||
``` | ||
|
||
默认配置下,前端中配置的后台地址信息是localhost,确保后端服务器和打开页面的游览器在同一台机器上,不在一台机器的配置方式见下方的FAQ:【后端如果部署在其它机器或者别的端口如何修改】 | ||
|
||
## Docker启动 | ||
|
||
### 后端docker | ||
后端docker使用[paddlepaddle官方docker](https://www.paddlepaddle.org.cn/),这里演示CPU版本 | ||
``` | ||
# 拉取PaddleSpeech项目 | ||
cd PaddleSpeechServer | ||
git clone https://github.com/PaddlePaddle/PaddleSpeech.git | ||
|
||
# 拉取镜像 | ||
docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0 | ||
|
||
# 启动容器 | ||
docker run --name paddle -it -p 8010:8010 -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.3.0 /bin/bash | ||
|
||
# 进入容器 | ||
cd /paddle | ||
|
||
# 安装依赖 | ||
pip install -r requirements | ||
|
||
# 启动服务 | ||
python main --port 8010 | ||
|
||
``` | ||
|
||
### 前端docker | ||
|
||
前端docker直接使用[node官方的docker](https://hub.docker.com/_/node)即可 | ||
|
||
```shell | ||
docker pull node | ||
``` | ||
|
||
镜像中安装依赖 | ||
|
||
```shell | ||
cd PaddleSpeechWebClient | ||
# 映射外部8011端口 | ||
docker run -it -p 8011:8011 -v $PWD:/paddle node:latest bin/bash | ||
# 进入容器中 | ||
cd /paddle | ||
# 安装依赖 | ||
yarn install | ||
# 启动前端 | ||
yarn dev --port 8011 | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
## FAQ | ||
|
||
#### Q: 如何安装node.js | ||
|
||
A: node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保npm可用 | ||
|
||
#### Q:后端如果部署在其它机器或者别的端口如何修改 | ||
|
||
A:后端的配置地址有分散在两个文件中 | ||
|
||
修改第一个文件`PaddleSpeechWebClient/vite.config.js` | ||
|
||
```json | ||
server: { | ||
host: "0.0.0.0", | ||
proxy: { | ||
"/api": { | ||
target: "http://localhost:8010", // 这里改成后端所在接口 | ||
changeOrigin: true, | ||
rewrite: (path) => path.replace(/^\/api/, ""), | ||
}, | ||
}, | ||
} | ||
``` | ||
|
||
修改第二个文件`PaddleSpeechWebClient/src/api/API.js`(Websocket代理配置失败,所以需要在这个文件中修改) | ||
|
||
```javascript | ||
// websocket (这里改成后端所在的接口) | ||
CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口 | ||
ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream', // Stream ASR 接口 | ||
TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口 | ||
``` | ||
|
||
#### Q:后端以IP地址的形式,前端无法录音 | ||
|
||
A:这里主要是游览器安全策略的限制,需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273) | ||
|
||
chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure | ||
|
||
|
||
|
||
|
||
## 参考资料 | ||
|
||
vue实现录音参考资料:https://blog.csdn.net/qq_41619796/article/details/107865602#t1 | ||
|
||
前端流式播放音频参考仓库: | ||
|
||
https://github.com/AnthumChris/fetch-stream-audio | ||
|
||
https://bm.enthuses.me/buffered.php?bref=6677 |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 103 additions & 0 deletions
103
demos/speech_web/speech_server/conf/tts_online_application.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# This is the parameter configuration file for streaming tts server. | ||
|
||
################################################################################# | ||
# SERVER SETTING # | ||
################################################################################# | ||
host: 0.0.0.0 | ||
port: 8092 | ||
|
||
# The task format in the engin_list is: <speech task>_<engine type> | ||
# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online. | ||
# protocol choices = ['websocket', 'http'] | ||
protocol: 'http' | ||
engine_list: ['tts_online-onnx'] | ||
|
||
|
||
################################################################################# | ||
# ENGINE CONFIG # | ||
################################################################################# | ||
|
||
################################### TTS ######################################### | ||
################### speech task: tts; engine_type: online ####################### | ||
tts_online: | ||
# am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc'] | ||
# fastspeech2_cnndecoder_csmsc support streaming am infer. | ||
am: 'fastspeech2_csmsc' | ||
am_config: | ||
am_ckpt: | ||
am_stat: | ||
phones_dict: | ||
tones_dict: | ||
speaker_dict: | ||
spk_id: 0 | ||
|
||
# voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc'] | ||
# Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference | ||
voc: 'mb_melgan_csmsc' | ||
voc_config: | ||
voc_ckpt: | ||
voc_stat: | ||
|
||
# others | ||
lang: 'zh' | ||
device: 'cpu' # set 'gpu:id' or 'cpu' | ||
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer, | ||
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio | ||
am_block: 72 | ||
am_pad: 12 | ||
# voc_pad and voc_block voc model to streaming voc infer, | ||
# when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal | ||
# when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal | ||
voc_block: 36 | ||
voc_pad: 14 | ||
|
||
|
||
|
||
################################################################################# | ||
# ENGINE CONFIG # | ||
################################################################################# | ||
|
||
################################### TTS ######################################### | ||
################### speech task: tts; engine_type: online-onnx ####################### | ||
tts_online-onnx: | ||
# am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx'] | ||
# fastspeech2_cnndecoder_csmsc_onnx support streaming am infer. | ||
am: 'fastspeech2_cnndecoder_csmsc_onnx' | ||
# am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model]; | ||
# if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model]; | ||
am_ckpt: # list | ||
am_stat: | ||
phones_dict: | ||
tones_dict: | ||
speaker_dict: | ||
spk_id: 0 | ||
am_sample_rate: 24000 | ||
am_sess_conf: | ||
device: "cpu" # set 'gpu:id' or 'cpu' | ||
use_trt: False | ||
cpu_threads: 4 | ||
|
||
# voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx'] | ||
# Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference | ||
voc: 'hifigan_csmsc_onnx' | ||
voc_ckpt: | ||
voc_sample_rate: 24000 | ||
voc_sess_conf: | ||
device: "cpu" # set 'gpu:id' or 'cpu' | ||
use_trt: False | ||
cpu_threads: 4 | ||
|
||
# others | ||
lang: 'zh' | ||
# am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer, | ||
# when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio | ||
am_block: 72 | ||
am_pad: 12 | ||
# voc_pad and voc_block voc model to streaming voc infer, | ||
# when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal | ||
# when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal | ||
voc_block: 36 | ||
voc_pad: 14 | ||
# voc_upsample should be same as n_shift on voc config. | ||
voc_upsample: 300 | ||
|
48 changes: 48 additions & 0 deletions
48
demos/speech_web/speech_server/conf/ws_conformer_wenetspeech_application_faster.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# This is the parameter configuration file for PaddleSpeech Serving. | ||
|
||
################################################################################# | ||
# SERVER SETTING # | ||
################################################################################# | ||
host: 0.0.0.0 | ||
port: 8090 | ||
|
||
# The task format in the engin_list is: <speech task>_<engine type> | ||
# task choices = ['asr_online'] | ||
# protocol = ['websocket'] (only one can be selected). | ||
# websocket only support online engine type. | ||
protocol: 'websocket' | ||
engine_list: ['asr_online'] | ||
|
||
|
||
################################################################################# | ||
# ENGINE CONFIG # | ||
################################################################################# | ||
|
||
################################### ASR ######################################### | ||
################### speech task: asr; engine_type: online ####################### | ||
asr_online: | ||
model_type: 'conformer_online_wenetspeech' | ||
am_model: # the pdmodel file of am static model [optional] | ||
am_params: # the pdiparams file of am static model [optional] | ||
lang: 'zh' | ||
sample_rate: 16000 | ||
cfg_path: | ||
decode_method: | ||
force_yes: True | ||
device: 'cpu' # cpu or gpu:id | ||
decode_method: "attention_rescoring" | ||
continuous_decoding: True # enable continue decoding when endpoint detected | ||
num_decoding_left_chunks: 16 | ||
am_predictor_conf: | ||
device: # set 'gpu:id' or 'cpu' | ||
switch_ir_optim: True | ||
glog_info: False # True -> print glog | ||
summary: True # False -> do not show predictor config | ||
|
||
chunk_buffer_conf: | ||
window_n: 7 # frame | ||
shift_n: 4 # frame | ||
window_ms: 25 # ms | ||
shift_ms: 10 # ms | ||
sample_rate: 16000 | ||
sample_width: 2 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.