-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
tokenizer =PragmaticTokenizer::Tokenizer.new({
language: :pl,
numbers: :all,
downcase: false,
contractions: { "os" => "osiedle", "os." => "osiedle" },
expand_contractions: true
})
puts tokenizer.tokenize("Na os.Piłsudskiego")
The proper tokenization should be
["Na", "osiedle", "Piłsudskiego"]
while tokenizer returns["Na", "Osiedle", ".", "Piłsudskiego"]
Metadata
Metadata
Assignees
Labels
No labels