Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,30 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
</tbody>
</table>

**Punctuation Restoration**

<table style="width:100%">
<thead>
<tr>
<th> Task </th>
<th> Dataset </th>
<th> Model Type </th>
<th> Link </th>
</tr>
</thead>
<tbody>

<tr>
<td>Punctuation Restoration</td>
<td>IWLST2012_zh</td>
<td>Ernie Linear</td>
<td>
<a href = "./examples/iwslt2012/punc0">iwslt2012-punc0</a>
</td>
</tr>
</tbody>
</table>

## Documents

Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](https://paperswithcode.com/area/audio) and [Music SoTA](https://paperswithcode.com/area/music) give you an overview of the hot academic topics in the related area. To focus on the tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.
Expand Down
24 changes: 24 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,30 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
</tbody>
</table>

**标点恢复**

<table style="width:100%">
<thead>
<tr>
<th> 任务 </th>
<th> 数据集 </th>
<th> 模型种类 </th>
<th> 链接 </th>
</tr>
</thead>
<tbody>

<tr>
<td>标点恢复</td>
<td>IWLST2012_zh</td>
<td>Ernie Linear</td>
<td>
<a href = "./examples/iwslt2012/punc0">iwslt2012-punc0</a>
</td>
</tr>
</tbody>
</table>

## 教程文档

对于 PaddleSpeech 的所关注的任务,以下指南有助于帮助开发者快速入门,了解语音相关核心思想。
Expand Down
31 changes: 17 additions & 14 deletions docs/source/released_model.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@

# Released Models

## Speech-to-Text Models

### Speech Recognition Model
Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | Example Link
:-------------:| :------------:| :-----: | -----: | :----------------- |:--------- | :---------- | :--------- | :-----------
:-------------:| :------------:| :-----: | -----: | :-----: |:-----:| :-----: | :-----: | :-----:
[Ds2 Online Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/aishell_ds2_online_cer8.00_release.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080 |-| 151 h | [D2 Online Aishell ASR0](../../examples/aishell/asr0)
[Ds2 Offline Aishell ASR0 Model](https://paddlespeech.bj.bcebos.com/s2t/aishell/asr0/ds2.model.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.064 |-| 151 h | [Ds2 Offline Aishell ASR0](../../examples/aishell/asr0)
[Conformer Online Aishell ASR1 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz) | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0594 |-| 151 h | [Conformer Online Aishell ASR1](../../examples/aishell/asr1)
Expand All @@ -17,22 +16,21 @@ Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER |

### Language Model based on NGram
Language Model | Training Data | Token-based | Size | Descriptions
:-------------:| :------------:| :-----: | -----: | :-----------------
:------------:| :------------:|:------------: | :------------: | :------------:
[English LM](https://deepspeech.bj.bcebos.com/en_lm/common_crawl_00.prune01111.trie.klm) | [CommonCrawl(en.00)](http://web-language-models.s3-website-us-east-1.amazonaws.com/ngrams/en/deduped/en.00.deduped.xz) | Word-based | 8.3 GB | Pruned with 0 1 1 1 1; <br/> About 1.85 billion n-grams; <br/> 'trie' binary with '-a 22 -q 8 -b 8'
[Mandarin LM Small](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm) | Baidu Internal Corpus | Char-based | 2.8 GB | Pruned with 0 1 2 4 4; <br/> About 0.13 billion n-grams; <br/> 'probing' binary with default settings
[Mandarin LM Large](https://deepspeech.bj.bcebos.com/zh_lm/zhidao_giga.klm) | Baidu Internal Corpus | Char-based | 70.4 GB | No Pruning; <br/> About 3.7 billion n-grams; <br/> 'probing' binary with default settings

### Speech Translation Models

| Model | Training Data | Token-based | Size | Descriptions | BLEU | Example Link |
| ------------------------------------------------------------ | ------------- | ----------- | ---- | ------------------------------------------------------------ | ----- | ------------------------------------------------------------ |
| [Transformer FAT-ST MTL En-Zh](https://paddlespeech.bj.bcebos.com/s2t/ted_en_zh/st1/fat_st_ted-en-zh.tar.gz) | Ted-En-Zh | Spm | | Encoder:Transformer, Decoder:Transformer, <br />Decoding method: Attention | 20.80 | [Transformer Ted-En-Zh ST1](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/ted_en_zh/st1) |

| Model | Training Data | Token-based | Size | Descriptions | BLEU | Example Link |
| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| [Transformer FAT-ST MTL En-Zh](https://paddlespeech.bj.bcebos.com/s2t/ted_en_zh/st1/fat_st_ted-en-zh.tar.gz) | Ted-En-Zh| Spm| | Encoder:Transformer, Decoder:Transformer, <br />Decoding method: Attention | 20.80 | [Transformer Ted-En-Zh ST1](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/examples/ted_en_zh/st1) |

## Text-to-Speech Models

### Acoustic Models
Model Type | Dataset| Example Link | Pretrained Models|Static Models|Siize(static)
Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (static)
:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.3.zip)|||
TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)|||
Expand All @@ -44,8 +42,8 @@ FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/Pa
FastSpeech2| VCTK |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)|||

### Vocoders
Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size(static)
:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size (static)
:-----:| :-----:| :-----: | :-----:| :-----:| :-----:
WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/waveflow/waveflow_ljspeech_ckpt_0.3.zip)|||
Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip)|5.1MB|
Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_ljspeech_ckpt_0.5.zip)|||
Expand All @@ -69,10 +67,15 @@ Model Type | Dataset| Example Link | Pretrained Models
PANN | Audioset| [audioset_tagging_cnn](https://github.com/qiuqiangkong/audioset_tagging_cnn) | [panns_cnn6.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn6.pdparams),[panns_cnn10.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn10.pdparams),[panns_cnn14.pdparams](https://bj.bcebos.com/paddleaudio/models/panns_cnn14.pdparams)
PANN | ESC-50 |[pann-esc50]("./examples/esc50/cls0")|[panns_cnn6.tar.gz](https://paddlespeech.bj.bcebos.com/cls/panns_cnn6.tar.gz), [panns_cnn10](https://paddlespeech.bj.bcebos.com/cls/panns_cnn10.tar.gz), [panns_cnn14.tar.gz](https://paddlespeech.bj.bcebos.com/cls/panns_cnn14.tar.gz)

## Punctuation Restoration Models
Model Type | Dataset| Example Link | Pretrained Models
:-------------:| :------------:| :-----: | :-----:
Ernie Linear | IWLST2012_zh |[iwslt2012_punc0](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/iwslt2012/punc0)|[ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip)

## Speech Recognition Model from paddle 1.8

| Acoustic Model |Training Data| Token-based | Size | Descriptions | CER | WER | Hours of speech |
| :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: | :--------------: |
| Acoustic Model |Training Data| Token-based | Size | Descriptions | CER | WER | Hours of speech |
| :-----:| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
| [Ds2 Offline Aishell model](https://deepspeech.bj.bcebos.com/mandarin_models/aishell_model_v1.8_to_v2.x.tar.gz) | Aishell Dataset | Char-based | 234 MB | 2 Conv + 3 bidirectional GRU layers | 0.0804 | — | 151 h |
| [Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz) | Librispeech Dataset | Word-based | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | — | 0.0685 | 960 h |
| [Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz) | Baidu Internal English Dataset | Word-based | 273 MB | 2 Conv + 3 bidirectional GRU layers |— | 0.0541 | 8628 h |
| [Ds2 Offline Librispeech model](https://deepspeech.bj.bcebos.com/eng_models/librispeech_v1.8_to_v2.x.tar.gz) | Librispeech Dataset | Word-based | 307 MB | 2 Conv + 3 bidirectional sharing weight RNN layers | — | 0.0685 | 960 h |
| [Ds2 Offline Baidu en8k model](https://deepspeech.bj.bcebos.com/eng_models/baidu_en8k_v1.8_to_v2.x.tar.gz) | Baidu Internal English Dataset | Word-based | 273 MB | 2 Conv + 3 bidirectional GRU layers |— | 0.0541 | 8628 h|
43 changes: 27 additions & 16 deletions examples/iwslt2012/punc0/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,28 @@
# 中文实验例程
## 测试数据:
- IWLST2012中文:test2012
# Punctuation Restoration with IWLST2012
## Get Started
### Data Preprocessing
```bash
./run.sh --stage 0 --stop-stage 0
```
### Model Training
```bash
./run.sh --stage 1 --stop-stage 1
```
### Testing
```bash
./run.sh --stage 2 --stop-stage 2
```
### Punctuation Restoration
```bash
./run.sh --stage 3 --stop-stage 3
```
## Pretrained Model
The pretrained model can be downloaded here [ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip](https://paddlespeech.bj.bcebos.com/text/ernie_linear_p3_iwslt2012_zh_ckpt_0.1.1.zip).

## 运行代码
- 运行 `./run.sh 0 0 conf/ernie_linear.yaml 1`

## 实验结果:
- ErnieLinear
- 实验配置:conf/ernie_linear.yaml
- 测试结果

| | COMMA | PERIOD | QUESTION | OVERALL |
|-----------|-----------|-----------|-----------|--------- |
|Precision | 0.471831 | 0.497679 | 0.830189 | 0.599899 |
|Recall | 0.583172 | 0.641148 | 0.846154 | 0.690158 |
|F1 | 0.521626 | 0.560376 | 0.838095 | 0.640033 |
### Test Result
- Ernie Linear
| |COMMA | PERIOD | QUESTION | OVERALL|
|:-----:|:-----:|:-----:|:-----:|:-----:|
|Precision |0.510955 |0.526462 |0.820755 |0.619391|
|Recall |0.517433 |0.564179 |0.861386 |0.647666|
|F1 |0.514173 |0.544669 |0.840580 |0.633141|
44 changes: 44 additions & 0 deletions examples/iwslt2012/punc0/conf/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
###########################################################
# DATA SETTING #
###########################################################
dataset_type: Ernie
train_path: data/iwslt2012_zh/train.txt
dev_path: data/iwslt2012_zh/dev.txt
test_path: data/iwslt2012_zh/test.txt
batch_size: 64
num_workers: 2
data_params:
pretrained_token: ernie-1.0
punc_path: data/iwslt2012_zh/punc_vocab
seq_len: 100


###########################################################
# MODEL SETTING #
###########################################################
model_type: ErnieLinear
model:
pretrained_token: ernie-1.0
num_classes: 4

###########################################################
# OPTIMIZER SETTING #
###########################################################
optimizer_params:
weight_decay: 1.0e-6 # weight decay coefficient.

scheduler_params:
learning_rate: 1.0e-5 # learning rate.
gamma: 1.0 # scheduler gamma.

###########################################################
# TRAINING SETTING #
###########################################################
max_epoch: 20
num_snapshots: 5

###########################################################
# OTHER SETTING #
###########################################################
num_snapshots: 10 # max number of snapshots to keep while training
seed: 42 # random seed for paddle, random, and np.random
36 changes: 0 additions & 36 deletions examples/iwslt2012/punc0/conf/ernie_linear.yaml

This file was deleted.

23 changes: 0 additions & 23 deletions examples/iwslt2012/punc0/local/avg.sh

This file was deleted.

Empty file modified examples/iwslt2012/punc0/local/data.sh
100644 → 100755
Empty file.
12 changes: 12 additions & 0 deletions examples/iwslt2012/punc0/local/punc_restore.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash

config_path=$1
train_output_path=$2
ckpt_name=$3
text=$4
ckpt_prefix=${ckpt_name%.*}

python3 ${BIN_DIR}/punc_restore.py \
--config=${config_path} \
--checkpoint=${train_output_path}/checkpoints/${ckpt_name} \
--text=${text}
27 changes: 6 additions & 21 deletions examples/iwslt2012/punc0/local/test.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,26 +1,11 @@

#!/bin/bash

if [ $# != 2 ];then
echo "usage: ${0} config_path ckpt_path_prefix"
exit -1
fi

ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo "using $ngpu gpus..."

config_path=$1
ckpt_prefix=$2

python3 -u ${BIN_DIR}/test.py \
--ngpu 1 \
--config ${config_path} \
--result_file ${ckpt_prefix}.rsl \
--checkpoint_path ${ckpt_prefix}
train_output_path=$2
ckpt_name=$3

if [ $? -ne 0 ]; then
echo "Failed in evaluation!"
exit 1
fi
ckpt_prefix=${ckpt_name%.*}

exit 0
python3 ${BIN_DIR}/test.py \
--config=${config_path} \
--checkpoint=${train_output_path}/checkpoints/${ckpt_name}
29 changes: 5 additions & 24 deletions examples/iwslt2012/punc0/local/train.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,28 +1,9 @@
#!/bin/bash

if [ $# != 3 ];then
echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name log_dir"
exit -1
fi

ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo "using $ngpu gpus..."

config_path=$1
ckpt_name=$2
log_dir=$3

mkdir -p exp

python3 -u ${BIN_DIR}/train.py \
--ngpu ${ngpu} \
--config ${config_path} \
--output_dir exp/${ckpt_name} \
--log_dir ${log_dir}

if [ $? -ne 0 ]; then
echo "Failed in training!"
exit 1
fi
train_output_path=$2

exit 0
python3 ${BIN_DIR}/train.py \
--config=${config_path} \
--output-dir=${train_output_path} \
--ngpu=1
2 changes: 1 addition & 1 deletion examples/iwslt2012/punc0/path.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib/

MODEL=$1
MODEL=ernie_linear
export BIN_DIR=${MAIN_ROOT}/paddlespeech/text/exps/${MODEL}
Loading