Why representative of cluster using ent2freq and NOT sub2freq dict? 

I noticed that when at [this](https://github.com/malllabiisc/cesi/blob/387525173c040271bbe8e28e10a6b34d3e380b6e/src/cesi_main.py#L140) line the subject embeddings and relation embeddings are passed for clustering, and then the cluster representative is found using (possibly) wrong ent2freq dictionary [here](https://github.com/malllabiisc/cesi/blob/387525173c040271bbe8e28e10a6b34d3e380b6e/src/cluster.py#L54). The subject embeddings dict contains 11878 subjects, whereas the ent2freq dict contains 23219 entities. The ent2freq dict maps from entity, and not subject, to its frequency i.e. there is a mismatch in entity id and subject id. Could you please clarify this? I am happy to elaborate my concern if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why representative of cluster using ent2freq and NOT sub2freq dict? #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why representative of cluster using ent2freq and NOT sub2freq dict? #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions