This is issue is a reminder for myself.
Possible options:
- Chars frequencies
- 2-grams?
- The most frequent words (100 or 1000)?
- Smart/complex resolve between LangA and LangB by identifying traits that are present in one language and absent in another. - This could help when 2 languages have a very similar statistical characteristics.
- Řehůřek and Kolkus (2009)
See: