Skip to content

Commit 71bda24

Browse files
authored
[TTS]Fix canton (#2924)
* Update run.sh * Update README.md
1 parent 9db75af commit 71bda24

File tree

2 files changed

+1
-74
lines changed

2 files changed

+1
-74
lines changed

examples/canton/tts3/README.md

Lines changed: 1 addition & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -74,44 +74,4 @@ Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file tha
7474

7575
### Training details can refer to the script of [examples/aishell3/tts3](../../aishell3/tts3).
7676

77-
## Pretrained Model(Waiting========)
78-
Pretrained FastSpeech2 model with no silence in the edge of audios:
79-
- [fastspeech2_aishell3_ckpt_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip)
80-
- [fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip) (Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution)
81-
82-
83-
FastSpeech2 checkpoint contains files listed below.
84-
85-
```text
86-
fastspeech2_aishell3_ckpt_1.1.0
87-
├── default.yaml # default config used to train fastspeech2
88-
├── energy_stats.npy # statistics used to normalize energy when training fastspeech2
89-
├── phone_id_map.txt # phone vocabulary file when training fastspeech2
90-
├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2
91-
├── snapshot_iter_96400.pdz # model parameters and optimizer states
92-
├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2
93-
└── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2
94-
```
95-
You can use the following scripts to synthesize for `${BIN_DIR}/../sentences.txt` using pretrained fastspeech2 and parallel wavegan models.
96-
```bash
97-
source path.sh
98-
99-
FLAGS_allocator_strategy=naive_best_fit \
100-
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
101-
python3 ${BIN_DIR}/../synthesize_e2e.py \
102-
--am=fastspeech2_aishell3 \
103-
--am_config=fastspeech2_aishell3_ckpt_1.1.0/default.yaml \
104-
--am_ckpt=fastspeech2_aishell3_ckpt_1.1.0/snapshot_iter_96400.pdz \
105-
--am_stat=fastspeech2_aishell3_ckpt_1.1.0/speech_stats.npy \
106-
--voc=pwgan_aishell3 \
107-
--voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
108-
--voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
109-
--voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \
110-
--lang=zh \
111-
--text=${BIN_DIR}/../sentences.txt \
112-
--output_dir=exp/default/test_e2e \
113-
--phones_dict=fastspeech2_aishell3_ckpt_1.1.0/phone_id_map.txt \
114-
--speaker_dict=fastspeech2_aishell3_ckpt_1.1.0/speaker_id_map.txt \
115-
--spk_id=0 \
116-
--inference_dir=exp/default/inference
117-
```
77+
## Pretrained Model

examples/canton/tts3/run.sh

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -35,36 +35,3 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
3535
# synthesize_e2e, vocoder is pwgan by default
3636
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
3737
fi
38-
39-
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
40-
# inference with static model, vocoder is pwgan by default
41-
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
42-
fi
43-
44-
if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
45-
# install paddle2onnx
46-
version=$(echo `pip list |grep "paddle2onnx"` |awk -F" " '{print $2}')
47-
if [[ -z "$version" || ${version} != '1.0.0' ]]; then
48-
pip install paddle2onnx==1.0.0
49-
fi
50-
./local/paddle2onnx.sh ${train_output_path} inference inference_onnx fastspeech2_aishell3
51-
# considering the balance between speed and quality, we recommend that you use hifigan as vocoder
52-
./local/paddle2onnx.sh ${train_output_path} inference inference_onnx pwgan_aishell3
53-
# ./local/paddle2onnx.sh ${train_output_path} inference inference_onnx hifigan_aishell3
54-
55-
fi
56-
57-
# inference with onnxruntime, use fastspeech2 + pwgan by default
58-
if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
59-
./local/ort_predict.sh ${train_output_path}
60-
fi
61-
62-
if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
63-
./local/export2lite.sh ${train_output_path} inference pdlite fastspeech2_aishell3 x86
64-
./local/export2lite.sh ${train_output_path} inference pdlite pwgan_aishell3 x86
65-
# ./local/export2lite.sh ${train_output_path} inference pdlite hifigan_aishell3 x86
66-
fi
67-
68-
if [ ${stage} -le 8 ] && [ ${stop_stage} -ge 8 ]; then
69-
CUDA_VISIBLE_DEVICES=${gpus} ./local/lite_predict.sh ${train_output_path} || exit -1
70-
fi

0 commit comments

Comments
 (0)