PaddlePaddle · zh794390558 · Jun 17, 2022 · May 13, 2022 · May 20, 2022 · May 20, 2022
diff --git a/README_cn.md b/README_cn.md
@@ -159,6 +159,7 @@
 
 
 ### 近期更新
+
 - 👑 2022.05.13: PaddleSpeech 发布 [PP-ASR](./docs/source/asr/PPASR_cn.md) 流式语音识别系统、[PP-TTS](./docs/source/tts/PPTTS_cn.md) 流式语音合成系统、[PP-VPR](docs/source/vpr/PPVPR_cn.md) 全链路声纹识别系统
 - 👏🏻 2022.05.06: PaddleSpeech Streaming Server 上线! 覆盖了语音识别（标点恢复、时间戳），和语音合成。
 - 👏🏻 2022.05.06: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、语音合成、声纹识别，标点恢复。

diff --git a/demos/speech_web/.gitignore b/demos/speech_web/.gitignore
@@ -0,0 +1,16 @@
+*/.vscode/*
+*.wav
+*/resource/*
+.Ds*
+*.pyc
+*.pcm
+*.npy
+*.diff
+*.sqlite
+*/static/*
+*.pdparams
+*.pdiparams*
+*.pdmodel
+*/source/*
+*/PaddleSpeech/*
+
diff --git a/demos/speech_web/README_cn.md b/demos/speech_web/README_cn.md
@@ -0,0 +1,168 @@
+# Paddle Speech Demo
+
+PaddleSpeechDemo是一个以PaddleSpeech的语音交互功能为主体开发的Demo展示项目，用于帮助大家更好的上手PaddleSpeech以及使用PaddleSpeech构建自己的应用。
+
+智能语音交互部分使用PaddleSpeech，对话以及信息抽取部分使用PaddleNLP，网页前端展示部分基于Vue3进行开发
+
+主要功能：
+
++ 语音聊天：PaddleSpeech的语音识别能力+语音合成能力，对话部分基于PaddleNLP的闲聊功能
++ 声纹识别：PaddleSpeech的声纹识别功能展示
++ 语音识别：支持【实时语音识别】，【端到端识别】，【音频文件识别】三种模式
++ 语音合成：支持【流式合成】与【端到端合成】两种方式
++ 语音指令：基于PaddleSpeech的语音识别能力与PaddleNLP的信息抽取，实现交通费的智能报销
+
+运行效果：
+
+ ![效果](docs/效果展示.png)
+
+## 安装
+
+### 后端环境安装
+
+```
+# 安装环境
+cd speech_server
+pip install -r requirements.txt
+```
+
+
+### 前端环境安装
+
+前端依赖node.js ，需要提前安装，确保npm可用，npm测试版本8.3.1，建议下载[官网](https://nodejs.org/en/)稳定版的node.js
+
+```
+# 进入前端目录
+cd web_client
+
+# 安装yarn，已经安装可跳过
+npm install -g yarn
+
+# 使用yarn安装前端依赖
+yarn install
+```
+
+
+## 启动服务
+
+### 开启后端服务
+
+```
+cd speech_server
+# 默认8010端口
+python main.py --port 8010
+```
+
+### 开启前端服务
+
+```
+cd web_client
+yarn dev --port 8011
+```
+
+默认配置下，前端中配置的后台地址信息是localhost，确保后端服务器和打开页面的游览器在同一台机器上，不在一台机器的配置方式见下方的FAQ：【后端如果部署在其它机器或者别的端口如何修改】
+
+## Docker启动
+
+### 后端docker
+后端docker使用[paddlepaddle官方docker](https://www.paddlepaddle.org.cn),这里演示CPU版本
+```
+# 拉取PaddleSpeech项目
+cd PaddleSpeechServer
+git clone https://github.com/PaddlePaddle/PaddleSpeech.git
+
+# 拉取镜像
+docker pull registry.baidubce.com/paddlepaddle/paddle:2.3.0
+
+# 启动容器
+docker run --name paddle -it -p 8010:8010 -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.3.0 /bin/bash
+
+# 进入容器
+cd /paddle
+
+# 安装依赖
+pip install -r requirements
+
+# 启动服务
+python main --port 8010
+
+```
+
+### 前端docker
+
+前端docker直接使用[node官方的docker](https://hub.docker.com/_/node)即可
+
+```shell
+docker pull node
+```
+
+镜像中安装依赖
+
+```shell
+cd PaddleSpeechWebClient
+# 映射外部8011端口
+docker run -it -p 8011:8011 -v $PWD:/paddle node:latest bin/bash
+# 进入容器中
+cd /paddle
+# 安装依赖
+yarn install
+# 启动前端
+yarn dev --port 8011
+```
+
+
+
+
+
+## FAQ 
+
+#### Q: 如何安装node.js
+
+A： node.js的安装可以参考[【菜鸟教程】](https://www.runoob.com/nodejs/nodejs-install-setup.html), 确保npm可用
+
+#### Q：后端如果部署在其它机器或者别的端口如何修改
+
+A：后端的配置地址有分散在两个文件中
+
+修改第一个文件`PaddleSpeechWebClient/vite.config.js`
+
+```json
+server: {
+    host: "0.0.0.0",
+    proxy: {
+      "/api": {
+        target: "http://localhost:8010",  // 这里改成后端所在接口
+        changeOrigin: true,
+        rewrite: (path) => path.replace(/^\/api/, ""),
+      },
+    },
+  }
+```
+
+修改第二个文件`PaddleSpeechWebClient/src/api/API.js`（Websocket代理配置失败，所以需要在这个文件中修改）
+
+```javascript
+// websocket （这里改成后端所在的接口）
+CHAT_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/offlineStream', // ChatBot websocket 接口
+ASR_SOCKET_RECORD: 'ws://localhost:8010/ws/asr/onlineStream',  // Stream ASR 接口
+TTS_SOCKET_RECORD: 'ws://localhost:8010/ws/tts/online', // Stream TTS 接口
+```
+
+#### Q：后端以IP地址的形式，前端无法录音
+
+A：这里主要是游览器安全策略的限制，需要配置游览器后重启。游览器修改配置可参考[使用js-audio-recorder报浏览器不支持getUserMedia](https://blog.csdn.net/YRY_LIKE_YOU/article/details/113745273)
+
+chrome设置地址: chrome://flags/#unsafely-treat-insecure-origin-as-secure
+
+
+
+
+## 参考资料
+
+vue实现录音参考资料：https://blog.csdn.net/qq_41619796/article/details/107865602#t1
+
+前端流式播放音频参考仓库：
+
+https://github.com/AnthumChris/fetch-stream-audio
+
+https://bm.enthuses.me/buffered.php?bref=6677
diff --git a/demos/speech_web/docs/效果展示.png b/demos/speech_web/docs/效果展示.png
diff --git a/demos/speech_web/speech_server/conf/tts_online_application.yaml b/demos/speech_web/speech_server/conf/tts_online_application.yaml
@@ -0,0 +1,103 @@
+# This is the parameter configuration file for streaming tts server.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8092
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# engine_list choices = ['tts_online', 'tts_online-onnx'], the inference speed of tts_online-onnx is faster than tts_online.
+# protocol choices = ['websocket', 'http'] 
+protocol: 'http'
+engine_list: ['tts_online-onnx']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### TTS #########################################
+################### speech task: tts; engine_type: online #######################
+tts_online: 
+    # am (acoustic model) choices=['fastspeech2_csmsc', 'fastspeech2_cnndecoder_csmsc']   
+    # fastspeech2_cnndecoder_csmsc support streaming am infer.     
+    am: 'fastspeech2_csmsc'   
+    am_config: 
+    am_ckpt: 
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+
+    # voc (vocoder) choices=['mb_melgan_csmsc, hifigan_csmsc']
+    # Both mb_melgan_csmsc and hifigan_csmsc support streaming voc inference
+    voc: 'mb_melgan_csmsc'
+    voc_config: 
+    voc_ckpt: 
+    voc_stat: 
+
+    # others
+    lang: 'zh'
+    device: 'cpu' # set 'gpu:id' or 'cpu'
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
+    am_pad: 12
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
+    voc_pad: 14
+
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### TTS #########################################
+################### speech task: tts; engine_type: online-onnx #######################
+tts_online-onnx: 
+    # am (acoustic model) choices=['fastspeech2_csmsc_onnx', 'fastspeech2_cnndecoder_csmsc_onnx']
+    # fastspeech2_cnndecoder_csmsc_onnx support streaming am infer.        
+    am: 'fastspeech2_cnndecoder_csmsc_onnx' 
+    # am_ckpt is a list, if am is fastspeech2_cnndecoder_csmsc_onnx, am_ckpt = [encoder model, decoder model, postnet model];
+    # if am is fastspeech2_csmsc_onnx, am_ckpt = [ckpt model];
+    am_ckpt:   # list
+    am_stat: 
+    phones_dict: 
+    tones_dict: 
+    speaker_dict: 
+    spk_id: 0
+    am_sample_rate: 24000
+    am_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+
+    # voc (vocoder) choices=['mb_melgan_csmsc_onnx, hifigan_csmsc_onnx']
+    # Both mb_melgan_csmsc_onnx and hifigan_csmsc_onnx support streaming voc inference
+    voc: 'hifigan_csmsc_onnx'
+    voc_ckpt: 
+    voc_sample_rate: 24000
+    voc_sess_conf:
+        device: "cpu" # set 'gpu:id' or 'cpu'
+        use_trt: False
+        cpu_threads: 4
+
+    # others
+    lang: 'zh'
+    # am_block and am_pad only for fastspeech2_cnndecoder_onnx model to streaming am infer,
+    # when am_pad set 12, streaming synthetic audio is the same as non-streaming synthetic audio
+    am_block: 72
+    am_pad: 12
+    # voc_pad and voc_block voc model to streaming voc infer,
+    # when voc model is mb_melgan_csmsc_onnx, voc_pad set 14, streaming synthetic audio is the same as non-streaming synthetic audio; The minimum value of pad can be set to 7, streaming synthetic audio sounds normal
+    # when voc model is hifigan_csmsc_onnx, voc_pad set 19, streaming synthetic audio is the same as non-streaming synthetic audio; voc_pad set 14, streaming synthetic audio sounds normal
+    voc_block: 36
+    voc_pad: 14
+    # voc_upsample should be same as n_shift on voc config.
+    voc_upsample: 300
+
diff --git a/demos/speech_web/speech_server/conf/ws_conformer_wenetspeech_application_faster.yaml b/demos/speech_web/speech_server/conf/ws_conformer_wenetspeech_application_faster.yaml
@@ -0,0 +1,48 @@
+# This is the parameter configuration file for PaddleSpeech Serving.
+
+#################################################################################
+#                             SERVER SETTING                                    #
+#################################################################################
+host: 0.0.0.0
+port: 8090
+
+# The task format in the engin_list is: <speech task>_<engine type>
+# task choices = ['asr_online']
+# protocol = ['websocket'] (only one can be selected).
+# websocket only support online engine type.
+protocol: 'websocket'
+engine_list: ['asr_online']
+
+
+#################################################################################
+#                                ENGINE CONFIG                                  #
+#################################################################################
+
+################################### ASR #########################################
+################### speech task: asr; engine_type: online #######################
+asr_online:
+    model_type: 'conformer_online_wenetspeech'
+    am_model: # the pdmodel file of am static model [optional]
+    am_params:  # the pdiparams file of am static model [optional]
+    lang: 'zh'
+    sample_rate: 16000
+    cfg_path: 
+    decode_method: 
+    force_yes: True
+    device: 'cpu' # cpu or gpu:id
+    decode_method: "attention_rescoring"
+    continuous_decoding: True # enable continue decoding when endpoint detected
+    num_decoding_left_chunks: 16
+    am_predictor_conf:
+        device:  # set 'gpu:id' or 'cpu'
+        switch_ir_optim: True
+        glog_info: False  # True -> print glog
+        summary: True  # False -> do not show predictor config
+
+    chunk_buffer_conf:
+        window_n: 7     # frame
+        shift_n: 4      # frame
+        window_ms: 25   # ms
+        shift_ms: 10    # ms
+        sample_rate: 16000
+        sample_width: 2