Skip to content

Collection processing and dataframes #422

@bmcfee

Description

@bmcfee

A pretty common use case for mir_eval is to iteratively call mir_eval.[task].evalute(...) on a sequence of pairs of reference and estimates, then collect the results afterward into a dataframe for subsequent analysis.

This pattern is so common in fact, that it may be worth providing some scaffolding to streamline and standardize it.

What I'm thinking is something like the following pattern:

df = mir_eval.collections.evaluate(task='beat', generator, **kwargs)

where generator is a generator that yields dictionaries containing the fields necessary as input to the given task's evaluator (e.g., ref_intervals and est_intervals or whatever), optionally an id field (otherwise a counter index is constructed while iterating), and kwargs are additional kwargs for the evaluator.

The resulting df would have as columns all of the fields of the task's evaluator function, and an index column to key on the provided (or generated) id.

Caveats

  • Doing this would probably necessitate adding pandas as a dependency. I think this is fine in 2025.
  • We may need to build out some scaffolding utilities (to be housed under a collections module) to make it easier to build these generators. I don't have a great sense of how this would look yet, but it would probably become clear after prototyping the core functionality and using it a bit.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions