Skip to content

Sentence BLEU yields non-intuitive scores #141

@ozancaglayan

Description

@ozancaglayan

I noticed this when I intended to replace nltk's sentence bleu with sacreBLEU in some of my code. We should probably just return 0 if there are no matches at all.


In [55]: sacrebleu.sentence_bleu('yes', ['no'], smooth_method='exp')
Out[55]: BLEU = 50.00 50.0/0.0/0.0/0.0 (BP = 1.000 ratio = 1.000 hyp_len = 1 ref_len = 1)

In [56]: sacrebleu.sentence_bleu('yes', ['no'], smooth_method='floor')
Out[56]: BLEU = 10.00 10.0/0.0/0.0/0.0 (BP = 1.000 ratio = 1.000 hyp_len = 1 ref_len = 1)

In [46]: hyp
Out[46]: 'this is a cat'

In [47]: ref
Out[47]: 'okay thanks'

In [48]: sacrebleu.sentence_bleu(hyp, [ref], smooth_method='floor')
Out[48]: BLEU = 4.52 2.5/3.3/5.0/10.0 (BP = 1.000 ratio = 2.000 hyp_len = 4 ref_len = 2)

In [49]: sacrebleu.sentence_bleu(hyp, [ref], smooth_method='exp')
Out[49]: BLEU = 7.99 12.5/8.3/6.2/6.2 (BP = 1.000 ratio = 2.000 hyp_len = 4 ref_len = 2)

% nltk
In [50]: sentence_bleu([ref.split()], hyp.split(), smoothing_function=SmoothingFunction().method3)
Out[50]: 0

In [51]: sentence_bleu([ref.split()], hyp.split(), smoothing_function=SmoothingFunction().method2)
Out[51]: 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions