Skip to content

2-gram discount out of range for adjusted count  #18

@ezubaric

Description

@ezubaric

I have two files. One file works fine with kenlm, the other gives the following error:

jbg-hackintosh:qblearn jbg$ lmplz -o 2 -S 2G -T -kndiscount /tmp < bl > scratch/Literature/10393.comb.arpa
=== 1/5 Counting and sorting n-grams ===
Reading stdin
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100


Unigram tokens 2366 types 1168
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:14016 2:2147469568
/Users/jbg/repositories/kenlm/lm/builder/adjust_counts.cc:50 in void lm::builder::::StatCollector::CalculateDiscounts() threw BadDiscountException because `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
ERROR: 2-gram discount out of range for adjusted count 3: -0.402645
Abort trap: 6

The only difference between the two files is that one ends with the sentence:

Lord Melbourne offered him a lordship, which he declined

I've also sent the full files to Kenneth via e-mail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions