Skip to content

Refactoring ideas #125

@ozancaglayan

Description

@ozancaglayan

Hi all,

I would like to do another round of refactoring that would basically target the way hypotheses and references are read. Right now, corpus and sentence-level methods of each metric make some assumptions, fixes wrong inputs using isinstance() checks, and bail out if nothing can be done. IMHO, this creates a lot of confusion especially with the introduction of standalone metric classes vs. the compatibility API. I remember that we encountered a few issues regarding these and those were fixed here and there.

I think the metrics should just stick to one signature (i.e. no Union type annotation for sys and ref streams) and the isinstance checks should be moved to the compatibility layer if required.

I would suggest the following and throw an exception if this is not the case:

  • hyp: str, refs: [str, str2, ..., strC] for sentence-level scorers
  • hyps: [str1, str2, ..., strN] refs: [[str1, str2, ..., strC]_1, ..., [str1, str2, ..., strC]_N] (EDIT: This is wrong, see below)

Another option is to explicitly implement classes: Hypothesis, HypothesisList, Reference

  1. Moreover, the metrics should not receive streams that are processed on-the-fly but already populated lists. We may separate out the tokenization parts of metrics and call them first. This should simplify future significance testing as well, which will require reusing shuffled statistics many times.

What do you think?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions