-
Notifications
You must be signed in to change notification settings - Fork 858
Closed
Labels
Description
Issue description
I think that the correct and incorrect columns from the LFAnalysis().lf_summary(Y)
are wrong when you don't have an ABSTAIN
, label (i.e. -1) in the input data.
The confusion matrix does not have the same size if the unique labels are [0, 1, 2] or [-1, 0, 1, 2]. I understand that the [1:, 1:]
slicing avoids the label -1 (which the first row/colomn) here
snorkel/snorkel/labeling/analysis.py
Line 357 in ad5731a
confusions = [confusion_matrix(Y, self.L[:, i])[1:, 1:] for i in range(m)] |
Code example/repro steps
from typing import NamedTuple
import numpy as np
from sklearn.metrics import confusion_matrix
from snorkel.labeling import LFAnalysis
class MockLF(NamedTuple):
name: str
Y = np.array([0, 1, 0, 2, 1, 0, 1, 2])
lfs = [MockLF("one"), MockLF("two"), MockLF("three")]
# suppose you have three labeling functions
L = np.array([[0, 1, 0],
[0, 0, 1],
[1, 0, 2],
[2, 2, 2],
[0, 1, 1],
[0, 0, 1],
[0, 1, 0],
[0, 2, 1]])
L_w_abstain = np.array([[-1, 1, 0],
[0, 0, 1],
[1, -1, 2],
[2, 2, 2],
[0, 1, -1],
[0, 0, 1],
[0, -1, 0],
[0, 2, 1]])
analysis = LFAnalysis(L, lfs=lfs)
print(analysis.lf_summary(Y))
# well_a = LFAnalysis(L_w_abstain, lfs=lfs)
# print(well_a.lf_summary(Y))
Expected behavior
The first labeling function named "one" should have 3 corrects and 5 incorrects instead of
j Polarity Coverage Overlaps Conflicts Correct Incorrect Emp. Acc.
one 0 [0, 1, 2] 1.0 1.0 0.875 1 0 0.375
two 1 [0, 1, 2] 1.0 1.0 0.875 4 0 0.750
three 2 [0, 1, 2] 1.0 1.0 0.875 3 1 0.500
System info
- How you installed Snorkel (conda, pip, source): pip
- OS: GNU/Linux Debian unstable
- Python version: 3.7
- Snorkel version: 0.9
- Versions of any other relevant libraries:
Thanks!
Damien G.