Skip to content

Confusion matrix in LFAnalysis.lf_summary supposes you have at least one ABSTAIN label #1446

@garaud

Description

@garaud

Issue description

I think that the correct and incorrect columns from the LFAnalysis().lf_summary(Y) are wrong when you don't have an ABSTAIN, label (i.e. -1) in the input data.

The confusion matrix does not have the same size if the unique labels are [0, 1, 2] or [-1, 0, 1, 2]. I understand that the [1:, 1:]slicing avoids the label -1 (which the first row/colomn) here

confusions = [confusion_matrix(Y, self.L[:, i])[1:, 1:] for i in range(m)]

Code example/repro steps

from typing import NamedTuple

import numpy as np

from sklearn.metrics import confusion_matrix

from snorkel.labeling import LFAnalysis


class MockLF(NamedTuple):
    name: str


Y = np.array([0, 1, 0, 2, 1, 0, 1, 2])

lfs = [MockLF("one"), MockLF("two"), MockLF("three")]

# suppose you have three labeling functions
L = np.array([[0, 1, 0],
              [0, 0, 1],
              [1, 0, 2],
              [2, 2, 2],
              [0, 1, 1],
              [0, 0, 1],
              [0, 1, 0],
              [0, 2, 1]])

L_w_abstain = np.array([[-1, 1, 0],
                        [0, 0, 1],
                        [1, -1, 2],
                        [2, 2, 2],
                        [0, 1, -1],
                        [0, 0, 1],
                        [0, -1, 0],
                        [0, 2, 1]])

analysis = LFAnalysis(L, lfs=lfs)
print(analysis.lf_summary(Y))

# well_a = LFAnalysis(L_w_abstain, lfs=lfs)
# print(well_a.lf_summary(Y))

Expected behavior

The first labeling function named "one" should have 3 corrects and 5 incorrects instead of

       j   Polarity  Coverage  Overlaps  Conflicts  Correct  Incorrect  Emp. Acc.
one    0  [0, 1, 2]       1.0       1.0      0.875        1          0      0.375
two    1  [0, 1, 2]       1.0       1.0      0.875        4          0      0.750
three  2  [0, 1, 2]       1.0       1.0      0.875        3          1      0.500

System info

  • How you installed Snorkel (conda, pip, source): pip
  • OS: GNU/Linux Debian unstable
  • Python version: 3.7
  • Snorkel version: 0.9
  • Versions of any other relevant libraries:

Thanks!
Damien G.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions