Skip to content

Evaluate other language identification methods. #117

@greyblake

Description

@greyblake

This is issue is a reminder for myself.

Possible options:

  • Chars frequencies
  • 2-grams?
  • The most frequent words (100 or 1000)?
  • Smart/complex resolve between LangA and LangB by identifying traits that are present in one language and absent in another. - This could help when 2 languages have a very similar statistical characteristics.
  • Řehůřek and Kolkus (2009)

See:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions