Skip to content

Commit c5c3a8a

Browse files
authored
Merge branch 'develop' into fix-opencpop-svs1
2 parents 8515d64 + 67ae7c8 commit c5c3a8a

File tree

32 files changed

+311
-273
lines changed

32 files changed

+311
-273
lines changed

demos/TTSArmLinux/src/TTSCppFrontend

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../../TTSCppFrontend/
1+
../../TTSCppFrontend/

examples/aishell/asr0/utils

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../../../utils/
1+
../../../utils/

examples/csmsc/jets/README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,18 @@ This example contains code used to train a [JETS](https://arxiv.org/abs/2203.168
33

44
## Dataset
55
### Download and Extract
6-
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/source).
6+
Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
7+
8+
The structure of the folder is listed below.
9+
10+
```text
11+
└─ Wave
12+
└─ .wav files (audio speech)
13+
└─ PhoneLabeling
14+
└─ .interval files (alignment between phoneme and duration)
15+
└─ ProsodyLabeling
16+
└─ 000001-010000.txt (text with prosodic by pinyin)
17+
```
718

819
### Get MFA Result and Extract
920
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes and durations for JETS.

examples/csmsc/tts2/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,17 @@ This example contains code used to train a [SpeedySpeech](http://arxiv.org/abs/2
55
### Download and Extract
66
Download CSMSC from it's [Official Website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
77

8+
The structure of the folder is listed below.
9+
10+
```text
11+
└─ Wave
12+
└─ .wav files (audio speech)
13+
└─ PhoneLabeling
14+
└─ .interval files (alignment between phoneme and duration)
15+
└─ ProsodyLabeling
16+
└─ 000001-010000.txt (text with prosodic by pinyin)
17+
```
18+
819
### Get MFA Result and Extract
920
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for SPEEDYSPEECH.
1021
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.

examples/csmsc/voc5/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,17 @@ This example contains code used to train a [HiFiGAN](https://arxiv.org/abs/2010.
44
### Download and Extract
55
Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
66

7+
The structure of the folder is listed below.
8+
9+
```text
10+
└─ Wave
11+
└─ .wav files (audio speech)
12+
└─ PhoneLabeling
13+
└─ .interval files (alignment between phoneme and duration)
14+
└─ ProsodyLabeling
15+
└─ 000001-010000.txt (text with prosodic by pinyin)
16+
```
17+
718
### Get MFA Result and Extract
819
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
920
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.

examples/csmsc/voc5/iSTFTNet.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,17 @@ This example contains code used to train a [iSTFTNet](https://arxiv.org/abs/2203
66
### Download and Extract
77
Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
88

9+
The structure of the folder is listed below.
10+
11+
```text
12+
└─ Wave
13+
└─ .wav files (audio speech)
14+
└─ PhoneLabeling
15+
└─ .interval files (alignment between phoneme and duration)
16+
└─ ProsodyLabeling
17+
└─ 000001-010000.txt (text with prosodic by pinyin)
18+
```
19+
920
### Get MFA Result and Extract
1021
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
1122
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.

examples/librispeech/asr0/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ source path.sh
144144
bash ./local/data.sh
145145
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/deepspeech2.yaml deepspeech2
146146
avg.sh best exp/deepspeech2/checkpoints 1
147-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1
147+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/deepspeech2.yaml conf/tuning/decode.yaml exp/deepspeech2/checkpoints/avg_1
148148
```
149149
## Stage 4: Static graph model Export
150150
This stage is to transform dygraph to static graph.
@@ -185,5 +185,5 @@ wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/en/demo_002_en.w
185185
```
186186
You can train a model by yourself, then you need to prepare an audio file or use the audio demo above, please confirm the sample rate of the audio is 16K. You can get the result of the audio demo by running the script below.
187187
```bash
188-
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 data/demo_002_en.wav
188+
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/deepspeech2.yaml conf/tuning/decode.yaml exp/deepspeech2/checkpoints/avg_1 data/demo_002_en.wav
189189
```

examples/librispeech/asr1/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ or you can run these scripts in the command line (only use CPU).
148148
bash ./local/data.sh
149149
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
150150
avg.sh best exp/conformer/checkpoints 20
151-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
151+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20
152152
```
153153
## Pretrained Model
154154
You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md).
@@ -163,7 +163,7 @@ source path.sh
163163
# If you have process the data and get the manifest file, you can skip the following 2 steps
164164
bash local/data.sh --stage -1 --stop_stage -1
165165
bash local/data.sh --stage 2 --stop_stage 2
166-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
166+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20
167167
```
168168
The performance of the released models are shown in [here](./RESULTS.md).
169169

@@ -192,8 +192,8 @@ bash ./local/data.sh
192192
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
193193
avg.sh best exp/conformer/checkpoints 20
194194
# test stage is optional
195-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
196-
CUDA_VISIBLE_DEVICES= ./local/align.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20
195+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20
196+
CUDA_VISIBLE_DEVICES= ./local/align.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20
197197
```
198198
## Stage 5: Single Audio File Inference
199199
In some situations, you want to use the trained model to do the inference for the single audio file. You can use stage 5. The code is shown below
@@ -214,5 +214,5 @@ wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/en/demo_002_en.w
214214
```
215215
You need to prepare an audio file or use the audio demo above, please confirm the sample rate of the audio is 16K. You can get the result of the audio demo by running the script below.
216216
```bash
217-
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml exp/conformer/checkpoints/avg_20 data/demo_002_en.wav
217+
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_20 data/demo_002_en.wav
218218
```

examples/librispeech/asr2/steps

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../../../tools/kaldi/egs/wsj/s5/steps/
1+
../../../tools/kaldi/egs/wsj/s5/steps/

examples/tal_cs/asr1/README.md

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ The document below will describe the scripts in `run.sh` in detail.
2727
The path.sh contains the environment variables.
2828
```bash
2929
. ./path.sh
30-
. ./cmd.sh
3130
```
3231
This script needs to be run first. And another script is also needed:
3332
```bash
@@ -67,7 +66,6 @@ bash run.sh --stage 0 --stop_stage 0
6766
You can also just run these scripts in your command line.
6867
```bash
6968
. ./path.sh
70-
. ./cmd.sh
7169
bash ./local/data.sh
7270
```
7371
After processing the data, the `data` directory will look like this:
@@ -103,7 +101,6 @@ bash run.sh --stage 0 --stop_stage 1
103101
or you can run these scripts in the command line (only use CPU).
104102
```bash
105103
. ./path.sh
106-
. ./cmd.sh
107104
bash ./local/data.sh
108105
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
109106
```
@@ -124,7 +121,6 @@ or you can run these scripts in the command line (only use CPU).
124121

125122
```bash
126123
. ./path.sh
127-
. ./cmd.sh
128124
bash ./local/data.sh
129125
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
130126
avg.sh best exp/conformer/checkpoints 10
@@ -144,11 +140,10 @@ bash run.sh --stage 0 --stop_stage 3
144140
or you can run these scripts in the command line (only use CPU).
145141
```bash
146142
. ./path.sh
147-
. ./cmd.sh
148143
bash ./local/data.sh
149144
CUDA_VISIBLE_DEVICES= ./local/train.sh conf/conformer.yaml conformer
150145
avg.sh best exp/conformer/checkpoints 10
151-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_10
146+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_10
152147
```
153148
## Pretrained Model
154149
You can get the pretrained transformer or conformer from [this](../../../docs/source/released_model.md).
@@ -163,7 +158,7 @@ source path.sh
163158
# If you have process the data and get the manifest file, you can skip the following 2 steps
164159
bash local/data.sh --stage -1 --stop_stage -1
165160
bash local/data.sh --stage 2 --stop_stage 2
166-
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml exp/conformer/checkpoints/avg_10
161+
CUDA_VISIBLE_DEVICES= ./local/test.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_10
167162
```
168163
The performance of the released models are shown in [here](./RESULTS.md).
169164

@@ -186,5 +181,5 @@ wget -nc https://paddlespeech.bj.bcebos.com/datasets/single_wav/zh/demo_01_03.wa
186181
```
187182
You need to prepare an audio file or use the audio demo above, please confirm the sample rate of the audio is 16K. You can get the result of the audio demo by running the script below.
188183
```bash
189-
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml exp/conformer/checkpoints/avg_10 data/demo_01_03.wav
184+
CUDA_VISIBLE_DEVICES= ./local/test_wav.sh conf/conformer.yaml conf/tuning/decode.yaml exp/conformer/checkpoints/avg_10 data/demo_01_03.wav
190185
```

0 commit comments

Comments
 (0)