-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hi! Great idea to use embedding for protein sequence retrieval! When reading your codes, I have a problem.
On line 73 and 85 of biencoder.py
, you wrote x = x[:,0]
. Which means, the embedding of first token was used as embedding for whole sequence. It is common for BERT, but ESM2 Repo says Don't use with the pre-trained models - we trained without bos-token supervision)
. Is it OK to use first token or it is best comparing to other methods (like mean pooling)?
Thanks very much.
Metadata
Metadata
Assignees
Labels
No labels