Skip to content

[TTS] Add support for finetuning speedyspeech #1302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 12, 2022

Conversation

jerryuhoo
Copy link
Contributor

PR types

New features and Bug fixes

PR changes

Others

Describe

  1. 修复了在finetune.sh中找不到link_wav.py的bug
  2. 新增了speedypseech的gen_gta_mel.py,可以用来finetune vocoder。
  3. 给link_wav.py增加了两个exception,一个是跳过已存在的软链接,还有一个是如果找不到dump里的文件则删除dump_finetune中的文件。(这个问题的原因是什么?preprocess跳过了某些有问题的音频,原因是因为TextGrid中的音素有问题还是因为音频长度有问题?直接删除是否合理?)

如果遇到已存在的软链接则跳过,如果在dump中找不到这个符号则删除dump_finetune中的文件。
@zh794390558 zh794390558 added this to the r0.2.0 milestone Jan 11, 2022
@yt605155624
Copy link
Collaborator

"preprocess跳过了某些有问题的音频", 在 csmsc 的实验里没有遇到这种情况,但是在复杂的数据集里 MFA 的输出可能比输入的音频要少(比如我输入了 1W 条音频但是只返回了 9k 的 *.TextGrid, 可能不会差这么多,但是可能存在这种情况),你可以看下你自己的业务数据的 MFA 的结果是不是没有输入多哈,然后在 preprocess, 有一个判断


而 sentences 是通过读取 durations.txt 生成的,也就是我们只处理明确有 MFA 结果的数据
sentences, speaker_set = get_phn_dur(dur_file)

sub_output_dir.mkdir(parents=True, exist_ok=True)

with paddle.no_grad():
mel = speedyspeech_inference(phone_ids, tone_ids, spk_id=speaker_id)
Copy link
Collaborator

@yt605155624 yt605155624 Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

真实的 durations 需要输入给 speedyspeech_inference 才行(否则生成的 mel 和真实的音频长度不匹配),speedspeech 默认不支持这个参数,fastspeech2 的 gen_gta_mel 是额外写了一个 StyleFastSpeech2Inference 类

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那还是麻烦子龙老师您修改一下吧,我不知道加一个新的inference还是在老的上面改。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个你要是着急可以自己改,我最近没改这个的计划,可以参照 fastspeech2 的那个方式新增一个类
如果这个你暂时没时间改,可以先把这个文件删了,我把别的文件合并一下

Copy link
Collaborator

@yt605155624 yt605155624 Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

即使新增类也是要调用 speedyspeech 的 inference 函数的,所以这个函数里面可能需要加判断(只要保证原本的训练和预测不挂(这个 CI 会保证一部分),随意改就是了)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,那我自己先试着改一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不好意思提交的时候又忘了加test=tts了:joy:

@yt605155624
Copy link
Collaborator

写完代码之后可以用我们的 pre-commit 检查一下代码格式,注意只改自己修改的代码格式即可,其他的代码有可能也被改了格式,不要 add 就好

pip install pre-commit
pre-commit run --file 你修改的代码

@jerryuhoo
Copy link
Contributor Author

嗯好的,那我等下再提交一下pre-commit之后的更改吧


encodings = paddle.matmul(M, encodings)
if type(durations) == type(None):
pred_durations = self.duration_predictor(encodings) # (1, T)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一坨如果想改的话感觉也可以用 expand 函数简化一下(这里确实是我之前做的不好)
另外 expand 函数里面的 np.sum .zeros .max 也可以参照这里换成 paddle.xxx, 这样最后 M = paddle.to_tensor(M, dtype=encodings.dtype) 就不用 to_tensor 了(to_tensor 在动转静的时候可能会挂,如果你在这里直接把这一坨换成 expand 但是没有吧 numpy 的函数换掉的话,可能动转静会挂),不想改也不要紧,等你这个合了之后我改一下(你这里的用法算是提醒我了),改好之后 艾特 你

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那先合并吧,麻烦您修改了~

@yt605155624 yt605155624 modified the milestones: r0.2.0, r0.1.1 Jan 12, 2022
Copy link
Collaborator

@yt605155624 yt605155624 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yt605155624 yt605155624 merged commit 8f507ba into PaddlePaddle:develop Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants