[TTS] Add support for finetuning speedyspeech #1302

jerryuhoo · 2022-01-11T07:03:22Z

PR types

New features and Bug fixes

PR changes

Others

Describe

修复了在finetune.sh中找不到link_wav.py的bug
新增了speedypseech的gen_gta_mel.py，可以用来finetune vocoder。
给link_wav.py增加了两个exception，一个是跳过已存在的软链接，还有一个是如果找不到dump里的文件则删除dump_finetune中的文件。（这个问题的原因是什么？preprocess跳过了某些有问题的音频，原因是因为TextGrid中的音素有问题还是因为音频长度有问题？直接删除是否合理？）

如果遇到已存在的软链接则跳过，如果在dump中找不到这个符号则删除dump_finetune中的文件。

yt605155624 · 2022-01-11T07:11:35Z

"preprocess跳过了某些有问题的音频", 在 csmsc 的实验里没有遇到这种情况，但是在复杂的数据集里 MFA 的输出可能比输入的音频要少（比如我输入了 1W 条音频但是只返回了 9k 的 *.TextGrid, 可能不会差这么多，但是可能存在这种情况），你可以看下你自己的业务数据的 MFA 的结果是不是没有输入多哈，然后在 preprocess, 有一个判断

PaddleSpeech/paddlespeech/t2s/exps/speedyspeech/preprocess.py

Line 46 in 52a8b2f

if utt_id in sentences:

而 sentences 是通过读取 durations.txt 生成的，也就是我们只处理明确有 MFA 结果的数据

PaddleSpeech/paddlespeech/t2s/exps/speedyspeech/preprocess.py

Line 227 in 52a8b2f

sentences, speaker_set = get_phn_dur(dur_file)

yt605155624 · 2022-01-11T07:16:59Z

paddlespeech/t2s/exps/speedyspeech/gen_gta_mel.py

+        sub_output_dir.mkdir(parents=True, exist_ok=True)
+
+        with paddle.no_grad():
+            mel = speedyspeech_inference(phone_ids, tone_ids, spk_id=speaker_id)


真实的 durations 需要输入给 speedyspeech_inference 才行(否则生成的 mel 和真实的音频长度不匹配)，speedspeech 默认不支持这个参数，fastspeech2 的 gen_gta_mel 是额外写了一个 StyleFastSpeech2Inference 类

那还是麻烦子龙老师您修改一下吧，我不知道加一个新的inference还是在老的上面改。

这个你要是着急可以自己改，我最近没改这个的计划，可以参照 fastspeech2 的那个方式新增一个类
如果这个你暂时没时间改，可以先把这个文件删了，我把别的文件合并一下

即使新增类也是要调用 speedyspeech 的 inference 函数的，所以这个函数里面可能需要加判断（只要保证原本的训练和预测不挂（这个 CI 会保证一部分），随意改就是了）

好的，那我自己先试着改一下

不好意思提交的时候又忘了加test=tts了:joy:

yt605155624 · 2022-01-12T05:09:06Z

写完代码之后可以用我们的 pre-commit 检查一下代码格式，注意只改自己修改的代码格式即可，其他的代码有可能也被改了格式，不要 add 就好

pip install pre-commit
pre-commit run --file 你修改的代码

jerryuhoo · 2022-01-12T05:50:23Z

嗯好的，那我等下再提交一下pre-commit之后的更改吧

yt605155624 · 2022-01-12T06:12:21Z

paddlespeech/t2s/models/speedyspeech/speedyspeech.py

-
-        encodings = paddle.matmul(M, encodings)
+        if type(durations) == type(None):
+            pred_durations = self.duration_predictor(encodings)  # (1, T)


这一坨如果想改的话感觉也可以用 expand 函数简化一下（这里确实是我之前做的不好）
另外 expand 函数里面的 np.sum .zeros .max 也可以参照这里换成 paddle.xxx, 这样最后 M = paddle.to_tensor(M, dtype=encodings.dtype) 就不用 to_tensor 了（to_tensor 在动转静的时候可能会挂，如果你在这里直接把这一坨换成 expand 但是没有吧 numpy 的函数换掉的话，可能动转静会挂），不想改也不要紧，等你这个合了之后我改一下（你这里的用法算是提醒我了），改好之后艾特你

那先合并吧，麻烦您修改了~

yt605155624

LGTM

jerryuhoo added 3 commits January 11, 2022 14:43

fix link_wav.py path, test=tts

75c2bd5

[tts] add gen_gta_mel.py for finetuning speedypeech, test=tts

fcc34e3

deal with exceptions of link_wav.py

61b68ed

如果遇到已存在的软链接则跳过，如果在dump中找不到这个符号则删除dump_finetune中的文件。

mergify bot added T2S Example labels Jan 11, 2022

zh794390558 requested a review from yt605155624 January 11, 2022 07:05

zh794390558 added this to the r0.2.0 milestone Jan 11, 2022

yt605155624 reviewed Jan 11, 2022

View reviewed changes

zh794390558 assigned yt605155624 Jan 11, 2022

jerryuhoo added 2 commits January 11, 2022 16:32

Add durations to gen_gta_mel.py inference

be99807

Update link_wav.py, test=tts

1e710ef

Fix the code format, test=tts

111a452

yt605155624 reviewed Jan 12, 2022

View reviewed changes

yt605155624 modified the milestones: r0.2.0, r0.1.1 Jan 12, 2022

yt605155624 approved these changes Jan 12, 2022

View reviewed changes

yt605155624 merged commit 8f507ba into PaddlePaddle:develop Jan 12, 2022

yt605155624 mentioned this pull request Jan 12, 2022

speedyspeech inference 函数优化 #1318

Closed

yt605155624 added the contributor label Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTS] Add support for finetuning speedyspeech #1302

[TTS] Add support for finetuning speedyspeech #1302

Uh oh!

jerryuhoo commented Jan 11, 2022

Uh oh!

yt605155624 commented Jan 11, 2022

Uh oh!

yt605155624 Jan 11, 2022 •

edited

Loading

Uh oh!

jerryuhoo Jan 11, 2022

Uh oh!

yt605155624 Jan 11, 2022

Uh oh!

yt605155624 Jan 11, 2022 •

edited

Loading

Uh oh!

jerryuhoo Jan 11, 2022

Uh oh!

jerryuhoo Jan 11, 2022

Uh oh!

yt605155624 commented Jan 12, 2022

Uh oh!

jerryuhoo commented Jan 12, 2022

Uh oh!

yt605155624 Jan 12, 2022

Uh oh!

jerryuhoo Jan 12, 2022

Uh oh!

yt605155624 left a comment

Uh oh!

Uh oh!

[TTS] Add support for finetuning speedyspeech #1302

[TTS] Add support for finetuning speedyspeech #1302

Uh oh!

Conversation

jerryuhoo commented Jan 11, 2022

PR types

PR changes

Describe

Uh oh!

yt605155624 commented Jan 11, 2022

Uh oh!

yt605155624 Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryuhoo Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

yt605155624 Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

yt605155624 Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryuhoo Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

jerryuhoo Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

yt605155624 commented Jan 12, 2022

Uh oh!

jerryuhoo commented Jan 12, 2022

Uh oh!

yt605155624 Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

jerryuhoo Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

yt605155624 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yt605155624 Jan 11, 2022 •

edited

Loading

yt605155624 Jan 11, 2022 •

edited

Loading