Skip to content

[CRITICAL BUG] Inconsistent scores of XCOMET between v2.2.4 and v2.2.5 #244

@de9uch1

Description

@de9uch1

@ricardorei
Hi, I'm a developer of mbrs, a library for MBR decoding.
I found a critical bug of XCOMET through our unittests.

🐛 Bug

The details of this bug and the cause were also reported in NL2G/xCOMET-lite#8

To Reproduce

Run this example code on unbabel-comet==2.2.4 and 2.2.5.
https://huggingface.co/Unbabel/XCOMET-XL

We observed that some examples varied the scores up to 10%.

Expected behaviour

The XCOMET scores should be consistent between v2.2.4 and v2.2.5.

Environment

OS: Linux
Packaging pip
Version 2.2.5 (and <=2.2.4)

Additional context

This is caused by this change ( 0ccf951 ).
Now (v2.2.5), different hyperparameters are passed from the version <=2.2.4 when initializeing the XCOMETMetric class.
Specifically, layer_transformation argument was not passed and set to "softmax" automatically before, but "sparsemax" is set now.

What makes things even more complicated is that, because save_hyperparameters() is called in the XCOMETMetric.__init__() here, after initializing LayerwiseAttention; thus, sparsemax, which was not actually passed before, but was saved in hparams.yaml.

This can be fixed in two ways

  1. Change hparams.yaml: change layer_transformation to "softmax" and load the model by comet.load_from_checkpoint(model_path, reload_hparams=True). reload_hparams=True option must be set, because the hyperparameters saved in the checkpoint file are loaded instead of hparams.yaml, if not set.
  2. Modify the checkpoint file: Modify the hyperparameters dict saved in this checkpoint. Probably, this method does not require the change of load_from_checkpoint() arguments.

As you can see, the solution is simple, but I'm not sure it would be the change you intended, so I created an issue instead of a pull request.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions