Skip to content

Entropy Based Evaluation of unlabelled sections #365

@davies-w

Description

@davies-w

Dear MIR community,

I've been revisiting the reference Professor McFee gave in #363 (comment), and it's still not clear that either Onset or Hit Rate are the right metric for our problem (which is N unlabeled estimated boundaries vs M human boundaries), primarily around the lack of a smoothness function (due to the tolerance factor).

I've actually just started using Entropy measure to help fit estimated boundaries to 16-bar boundaries, and realized that this approach might be useful for measuring the estimated boundaries to reference boundaries as well. I'm not very mathematically equipped, but the algorithm would go something like this:

Each boundary contains beat aligned intervals. The Entropy H(p,n) = p/p+n log p/p+n) is calculated by taking the first estimated boundary, making the intervals inside that "p", and all the rest "n". We then project those into the reference boundary, and compute the Entropy. EG 0 0 0 0 1 1 1 1 2 2 2 2, vs 0 0 0 0 0 0 1 1 1 1 1 1, would give H(4, 6) + H(0, 6), + H(2, 6) + H(2,6) + H(0, 6) + H(4, 6). However, we'd want both directions, and so would also have H(4,4) + H(2,4) + H(0,4) + H(0,4) + H(2,4) + H(4,4), and we'd combine them using some aggregate (sum, min, average).

I don't want to reinvent the wheel here, so if anyone has any ideas why this is a bad idea or if it's been done before, I'd love feedback. To reiterate, the goal is an objective function measuring closeness of sectioning, not the precision/recall of hit/not a hit. In the tolerance world, the precision and recall of the above section is zero (IIUC).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions