Skip to content

[ASR] add asr code-switch cli and demo, test='asr' #2816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

### Recent Update
- 🔥 2022.01.10: Add [code-switch asr CLI and Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: Add [code-switch asr tal_cs recipe](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
- 🎉 2022.11.30: Add [TTS Android Demo](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid).
- 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) and [official website
Expand Down
2 changes: 2 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@


### 近期更新
- 🔥 2022.01.10: 新增 [中英混合 ASR CLI 和 Demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_recognition).
- 👑 2022.01.06: 新增 [ASR中英混合 tal_cs 训练推理流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/tal_cs/asr1/).
- 🎉 2022.12.02: 新增 [端到端韵律预测全流程](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3_rhy) (包含在声学模型中使用韵律标签)。
- 🎉 2022.11.30: 新增 [TTS Android 部署示例](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/TTSAndroid)。
- 🤗 2022.11.28: PP-TTS and PP-ASR 示例可在 [AIStudio](https://aistudio.baidu.com/aistudio/modelsoverview) 和[飞桨官网](https://www.paddlepaddle.org.cn/models)体验!
Expand Down
28 changes: 16 additions & 12 deletions demos/speech_recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The input of this demo should be a WAV file(`.wav`), and the sample rate must be

Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```

### 3. Usage
Expand All @@ -27,6 +27,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# English
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# Code-Switch
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# Chinese ASR + Punctuation Restoration
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
Expand All @@ -40,6 +42,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(required): Audio file to recognize.
- `model`: Model type of asr task. Default: `conformer_wenetspeech`.
- `lang`: Model language. Default: `zh`.
- `codeswitch`: Code Swith Model. Default: `False`
- `sample_rate`: Sample rate of the model. Default: `16000`.
- `config`: Config of asr task. Use pretrained model when it is None. Default: `None`.
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
Expand Down Expand Up @@ -83,14 +86,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

| Model | Language | Sample Rate
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| Model | Code Switch | Language | Sample Rate
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k
29 changes: 17 additions & 12 deletions demos/speech_recognition/README_cn.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
(简体中文|[English](./README.md))
(简体中文|[English](./README.md))

# 语音识别
## 介绍
Expand All @@ -16,7 +17,7 @@

可以下载此 demo 的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
```
### 3. 使用方法
- 命令行 (推荐使用)
Expand All @@ -25,6 +26,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
paddlespeech asr --input ./zh.wav -v
# 英文
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
#中英混合
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav -v
# 中文 + 标点恢复
paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
```
Expand All @@ -38,6 +41,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `input`(必须输入):用于识别的音频文件。
- `model`:ASR 任务的模型,默认值:`conformer_wenetspeech`。
- `lang`:模型语言,默认值:`zh`。
- `codeswitch`: 是否使用语言转换,默认值:`False`。
- `sample_rate`:音频采样率,默认值:`16000`。
- `config`:ASR 任务的参数文件,若不设置则使用预训练模型中的默认配置,默认值:`None`。
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。
Expand Down Expand Up @@ -80,14 +84,15 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
### 4.预训练模型
以下是 PaddleSpeech 提供的可以被命令行和 python API 使用的预训练模型列表:

| 模型 | 语言 | 采样率
| :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k
| 模型 | 语言转换 | 语言 | 采样率
| :--- | :---: | :---: | :---: |
| conformer_wenetspeech | False | zh | 16k
| conformer_online_multicn | False | zh | 16k
| conformer_aishell | False | zh | 16k
| conformer_online_aishell | False | zh | 16k
| transformer_librispeech | False | en | 16k
| deepspeech2online_wenetspeech | False | zh | 16k
| deepspeech2offline_aishell | False | zh| 16k
| deepspeech2online_aishell | False | zh | 16k
| deepspeech2offline_librispeech | False | en | 16k
| conformer_talcs | True | zh_en | 16k
6 changes: 6 additions & 0 deletions demos/speech_recognition/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav

# asr
paddlespeech asr --input ./zh.wav
Expand All @@ -18,6 +19,11 @@ paddlespeech asr --help
# english asr
paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav


# code-switch asr
paddlespeech asr --lang zh_en --codeswitch True --model conformer_talcs --input ./ch_zh_mix.wav


# model stats
paddlespeech stats --task asr

Expand Down
28 changes: 21 additions & 7 deletions paddlespeech/cli/asr/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
import numpy as np
import paddle
import soundfile
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig
from yacs.config import CfgNode

from ...utils.env import MODEL_HOME
Expand All @@ -34,9 +37,6 @@
from ..utils import CLI_TIMER
from ..utils import stats_wrapper
from ..utils import timer_register
from paddlespeech.audio.transform.transformation import Transformation
from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer
from paddlespeech.s2t.utils.utility import UpdateConfig

__all__ = ['ASRExecutor']

Expand All @@ -62,8 +62,13 @@ def __init__(self):
'--lang',
type=str,
default='zh',
help='Choose model language. zh or en, zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k]'
help='Choose model language. [zh, en, zh_en], zh:[conformer_wenetspeech-zh-16k], en:[transformer_librispeech-en-16k], zh_en:[conformer_talcs-codeswitch_zh_en-16k]'
)
self.parser.add_argument(
'--codeswitch',
type=bool,
default=False,
help='Choose whether use code-switch. True or False.')
self.parser.add_argument(
"--sample_rate",
type=int,
Expand Down Expand Up @@ -127,6 +132,7 @@ def __init__(self):
def _init_from_path(self,
model_type: str='wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
cfg_path: Optional[os.PathLike]=None,
decode_method: str='attention_rescoring',
Expand All @@ -144,7 +150,12 @@ def _init_from_path(self,

if cfg_path is None or ckpt_path is None:
sample_rate_str = '16k' if sample_rate == 16000 else '8k'
tag = model_type + '-' + lang + '-' + sample_rate_str
if lang == "zh_en" and codeswitch is True:
tag = model_type + '-' + 'codeswitch_' + lang + '-' + sample_rate_str
elif lang == "zh_en" or codeswitch is True:
raise Exception("codeswitch is true only in zh_en model")
else:
tag = model_type + '-' + lang + '-' + sample_rate_str
self.task_resource.set_task_model(tag, version=None)
self.res_path = self.task_resource.res_dir

Expand Down Expand Up @@ -423,6 +434,7 @@ def execute(self, argv: List[str]) -> bool:

model = parser_args.model
lang = parser_args.lang
codeswitch = parser_args.codeswitch
sample_rate = parser_args.sample_rate
config = parser_args.config
ckpt_path = parser_args.ckpt_path
Expand All @@ -444,6 +456,7 @@ def execute(self, argv: List[str]) -> bool:
audio_file=input_,
model=model,
lang=lang,
codeswitch=codeswitch,
sample_rate=sample_rate,
config=config,
ckpt_path=ckpt_path,
Expand Down Expand Up @@ -472,6 +485,7 @@ def __call__(self,
audio_file: os.PathLike,
model: str='conformer_u2pp_online_wenetspeech',
lang: str='zh',
codeswitch: bool=False,
sample_rate: int=16000,
config: os.PathLike=None,
ckpt_path: os.PathLike=None,
Expand All @@ -485,8 +499,8 @@ def __call__(self,
"""
audio_file = os.path.abspath(audio_file)
paddle.set_device(device)
self._init_from_path(model, lang, sample_rate, config, decode_method,
num_decoding_left_chunks, ckpt_path)
self._init_from_path(model, lang, codeswitch, sample_rate, config,
decode_method, num_decoding_left_chunks, ckpt_path)
if not self._check(audio_file, sample_rate, force_yes):
sys.exit(-1)
if rtf:
Expand Down
19 changes: 17 additions & 2 deletions paddlespeech/cli/base_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import argparse
from typing import List

import numpy
from prettytable import PrettyTable

from ..resource import CommonTaskResource
Expand Down Expand Up @@ -78,7 +79,7 @@ def execute(self, argv: List[str]) -> bool:


model_name_format = {
'asr': 'Model-Language-Sample Rate',
'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate',
'cls': 'Model-Sample Rate',
'st': 'Model-Source language-Target language',
'text': 'Model-Task-Language',
Expand Down Expand Up @@ -111,7 +112,21 @@ def show_support_models(self, pretrained_models: dict):
fields = model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
line = key.split("-")
if self.task == "asr" and len(line) < len(fields):
for i in range(len(line), len(fields)):
line.append("-")
if "codeswitch" in key:
line[3], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
elif "multilingual" in key:
line[4], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
tmp = numpy.array(line)
idx = [0, 5, 3, 4, 1, 2]
line = tmp[idx]
table.add_row(line)

print(table)

def execute(self, argv: List[str]) -> bool:
Expand Down
13 changes: 13 additions & 0 deletions paddlespeech/resource/pretrained_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
]

# The tags for pretrained_models should be "{model_name}[_{dataset}][-{lang}][-...]".
# Add code-switch and multilingual tag, "{model_name}[_{dataset}]-[codeswitch/multilingual][_{lang}][-...]".
# e.g. "conformer_wenetspeech-zh-16k" and "panns_cnn6-32k".
# Command line and python api use "{model_name}[_{dataset}]" as --model, usage:
# "paddlespeech asr --model conformer_wenetspeech --lang zh --sr 16000 --input ./input.wav"
Expand Down Expand Up @@ -322,6 +323,18 @@
'099a601759d467cd0a8523ff939819c5'
},
},
"conformer_talcs-codeswitch_zh_en-16k": {
'1.4': {
'url':
'https://paddlespeech.bj.bcebos.com/s2t/tal_cs/asr1/asr1_conformer_talcs_ckpt_1.4.0.model.tar.gz',
'md5':
'01962c5d0a70878fe41cacd4f61e14d1',
'cfg_path':
'model.yaml',
'ckpt_path':
'exp/conformer/checkpoints/avg_10'
},
},
}

asr_static_pretrained_models = {
Expand Down
30 changes: 22 additions & 8 deletions paddlespeech/server/bin/paddlespeech_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,22 @@
import warnings
from typing import List

import numpy
import uvicorn
from fastapi import FastAPI
from prettytable import PrettyTable
from starlette.middleware.cors import CORSMiddleware

from ..executor import BaseExecutor
from ..util import cli_server_register
from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.resource import CommonTaskResource
from paddlespeech.server.engine.engine_pool import init_engine_pool
from paddlespeech.server.engine.engine_warmup import warm_up
from paddlespeech.server.restful.api import setup_router as setup_http_router
from paddlespeech.server.utils.config import get_config
from paddlespeech.server.ws.api import setup_router as setup_ws_router
from prettytable import PrettyTable
from starlette.middleware.cors import CORSMiddleware

from ..executor import BaseExecutor
from ..util import cli_server_register
from ..util import stats_wrapper
warnings.filterwarnings("ignore")

__all__ = ['ServerExecutor', 'ServerStatsExecutor']
Expand Down Expand Up @@ -134,7 +135,7 @@ def __init__(self):
required=True)
self.task_choices = ['asr', 'tts', 'cls', 'text', 'vector']
self.model_name_format = {
'asr': 'Model-Language-Sample Rate',
'asr': 'Model-Size-Code Switch-Multilingual-Language-Sample Rate',
'tts': 'Model-Language',
'cls': 'Model-Sample Rate',
'text': 'Model-Task-Language',
Expand All @@ -145,7 +146,20 @@ def show_support_models(self, pretrained_models: dict):
fields = self.model_name_format[self.task].split("-")
table = PrettyTable(fields)
for key in pretrained_models:
table.add_row(key.split("-"))
line = key.split("-")
if self.task == "asr" and len(line) < len(fields):
for i in range(len(line), len(fields)):
line.append("-")
if "codeswitch" in key:
line[3], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
elif "multilingual" in key:
line[4], line[1] = line[1].split("_")[0], line[1].split(
"_")[1:]
tmp = numpy.array(line)
idx = [0, 5, 3, 4, 1, 2]
line = tmp[idx]
table.add_row(line)
print(table)

def execute(self, argv: List[str]) -> bool:
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/cli/test_cli.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ paddlespeech ssl --task asr --lang en --input ./en.wav
paddlespeech ssl --task vector --lang en --input ./en.wav

# Speech_recognition
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/ch_zh_mix.wav
paddlespeech asr --input ./zh.wav
paddlespeech asr --model conformer_aishell --input ./zh.wav
paddlespeech asr --model conformer_online_aishell --input ./zh.wav
Expand All @@ -26,6 +26,7 @@ paddlespeech asr --model deepspeech2offline_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2online_wenetspeech --input ./zh.wav
paddlespeech asr --model deepspeech2online_aishell --input ./zh.wav
paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav
paddlespeech asr --model conformer_talcs --lang zh_en --codeswitch True --input ./ch_zh_mix.wav

# Support editing num_decoding_left_chunks
paddlespeech asr --model conformer_online_wenetspeech --num_decoding_left_chunks 3 --input ./zh.wav
Expand Down