Skip to content

Extracting Keywords That Are Subsets Of Each Other #137

@shner-elmo

Description

@shner-elmo

Hey everyone, I was wondering why is it that when we have multiple keywords that overlap, for example: ["super computer", "computer game"] we only extract the longest one, why not extract both of them?
I would assume that if you want to extract a bunch of keywords from a document it makes sense to get all the matches (even those that overlap), then the user could decide what to do with them.

In this test we want to make sure that with the following sentence: sentence = "distributed super computer game" and these following keywords:

{
    "Distributed Super Computer": ["distributed super computer"],
    "Computer Game": ["computer game"]
}

we only extract the first keyword which is the longest: "Distributed Super Computer", but why not get both in their order? i.e: ["Distributed Super Computer", "computer game"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions