Skip to content

About BOS #30

@ZichongWang

Description

@ZichongWang

Hi! Great idea to use embedding for protein sequence retrieval! When reading your codes, I have a problem.

On line 73 and 85 of biencoder.py, you wrote x = x[:,0]. Which means, the embedding of first token was used as embedding for whole sequence. It is common for BERT, but ESM2 Repo says Don't use with the pre-trained models - we trained without bos-token supervision). Is it OK to use first token or it is best comparing to other methods (like mean pooling)?

Thanks very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions