-
Notifications
You must be signed in to change notification settings - Fork 119
Description
I think there's a good case to be made that we should add a module for evaluating cover detection / version identification.
We have not done this in the past under the assumption that it's a fairly straightforward information retrieval problem, for which off-the-shelf tools (ie sklearn) provide the necessary functionality already.
While this is true in theory, I think in practice we often end up with custom implementations anyway, and there's actually more room for interpretation in the eval definition than we might initially have thought. Things like how empty positive sets are handled, or even transitivity / symmetry assumptions in version clusters, are not generally well defined, and have outsized influence on the resulting evaluation. This can lead to some discrepancies and/or bugs in evaluating version ID systems, the kind of which mir_eval was meant to resolve in general. (See, eg, furkanyesiler/re-move#4 )
This coupled with a general lack of a standard (and well maintained) tool for evaluating cover ID leaves things in a rather unsatisfying state. (I know acoss exists, but it hasn't been touched in 5 years, and it does force the user into a particularly narrow way of working.)
All that said, this is a fairly different kind of task from what we typically cover in mir_eval, since it's inherently collection-based rather than example-based. (This was partially, but not entirely, behind my posting over in #422 ) I don't think it's as radically different as something like bsseval though, and I think it can be done well in a clean and maintainable way.
What do folks think? Am I missing some widely adopted tool that's already out there? Or should we just do it?