Skip to content

New ranking feature based on word embeddings Word2Vec based on cosinus value #34

@martinreynaert

Description

@martinreynaert

We currently have ranking feature:
(skip[12]?0:(*vit)->cosine_rank);

This is based on the top 20 semantic nearest neighbours as returned on the basis of word2vec word embeddings and a further check on the cosine values. This works, but is too slow for production work. Perhaps the current request will warrant this earlier feature to be renamed.

I would like a new feature that for each pair of variant and particular CC retrieves the cosine value (as does ticcltool W2V-dist). Given all the values for all the CCs for a variant, the smallest value should then be ranked 'best', i.e. being assigned 1. Larger values then get assigned ranks 2, 3 ,4, etc. Possible draws get the same rank.

Many thanks!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions