Skip to content
This repository was archived by the owner on Sep 1, 2024. It is now read-only.
This repository was archived by the owner on Sep 1, 2024. It is now read-only.

Ask for help about the net_vocal #24

@dengyuanjie

Description

@dengyuanjie

Hello, I observed the effect of net_vocal_attributes in the whole model framework.

At present, the embedding extracted from the predicted sound, the distance of the negative sample pair (audio_embedding_A1_pred and audio_embedding_B1_pred) can reach 2, and the distance of the positive sample pair (audio_embedding_A1_pred and audio_embedding_A2_pred) can reach about 0.

But after I changed the input of net_vocal to pure real sound, the distance between negative sample pairs (audio_embedding_A1_gt and audio_embedding_B_gt) can only reach 1. That is to say, the sound feature extraction is not good when I train the net_vocal alone.

It stands to reason that pure ground voices are easier to extract features than predicted voices. I modified the parameters of the training (batch, learning rate, etc.) but none solved the problem. May I know what is the reason?

Looking forward to your reply!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions