-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[TTS] Add support for finetuning speedyspeech #1302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
如果遇到已存在的软链接则跳过,如果在dump中找不到这个符号则删除dump_finetune中的文件。
"preprocess跳过了某些有问题的音频", 在 csmsc 的实验里没有遇到这种情况,但是在复杂的数据集里 MFA 的输出可能比输入的音频要少(比如我输入了 1W 条音频但是只返回了 9k 的 *.TextGrid, 可能不会差这么多,但是可能存在这种情况),你可以看下你自己的业务数据的 MFA 的结果是不是没有输入多哈,然后在 preprocess, 有一个判断
而 sentences 是通过读取 durations.txt 生成的,也就是我们只处理明确有 MFA 结果的数据
|
sub_output_dir.mkdir(parents=True, exist_ok=True) | ||
|
||
with paddle.no_grad(): | ||
mel = speedyspeech_inference(phone_ids, tone_ids, spk_id=speaker_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
真实的 durations 需要输入给 speedyspeech_inference 才行(否则生成的 mel 和真实的音频长度不匹配),speedspeech 默认不支持这个参数,fastspeech2 的 gen_gta_mel 是额外写了一个 StyleFastSpeech2Inference 类
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那还是麻烦子龙老师您修改一下吧,我不知道加一个新的inference还是在老的上面改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个你要是着急可以自己改,我最近没改这个的计划,可以参照 fastspeech2 的那个方式新增一个类
如果这个你暂时没时间改,可以先把这个文件删了,我把别的文件合并一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
即使新增类也是要调用 speedyspeech 的 inference 函数的,所以这个函数里面可能需要加判断(只要保证原本的训练和预测不挂(这个 CI 会保证一部分),随意改就是了)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,那我自己先试着改一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不好意思提交的时候又忘了加test=tts了:joy:
写完代码之后可以用我们的 pre-commit 检查一下代码格式,注意只改自己修改的代码格式即可,其他的代码有可能也被改了格式,不要 add 就好
|
嗯好的,那我等下再提交一下pre-commit之后的更改吧 |
|
||
encodings = paddle.matmul(M, encodings) | ||
if type(durations) == type(None): | ||
pred_durations = self.duration_predictor(encodings) # (1, T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一坨如果想改的话感觉也可以用 expand 函数简化一下(这里确实是我之前做的不好)
另外 expand 函数里面的 np.sum .zeros .max 也可以参照这里换成 paddle.xxx, 这样最后 M = paddle.to_tensor(M, dtype=encodings.dtype) 就不用 to_tensor 了(to_tensor 在动转静的时候可能会挂,如果你在这里直接把这一坨换成 expand 但是没有吧 numpy 的函数换掉的话,可能动转静会挂),不想改也不要紧,等你这个合了之后我改一下(你这里的用法算是提醒我了),改好之后 艾特 你
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那先合并吧,麻烦您修改了~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features and Bug fixes
PR changes
Others
Describe