Skip to content

Conversation

garaud
Copy link
Contributor

@garaud garaud commented Sep 5, 2019

Description of proposed changes

Fix the computation of correct/incorrect values from the confusion matrix when the L input data didn't have ABSTAIN label. We only slice the confusion matrix on [1:, 1:] when there is some ABSTAIN label, i.e. -1 values. The full confusion matrix should be taken into account when you don't have -1 values.

Related issue(s)

Fixes #1446

Test plan

Checklist

Need help on these? Just ask!

  • I have read the CONTRIBUTING document.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

I've some trouble with pyspark for the test part. But I think it's not related with this fix.

Copy link
Member

@henryre henryre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garaud great find and great fix, thanks!! Looks like it's just a formatting issue. You can fix it by running tox -e fix, then verify by running tox

Copy link
Member

@bhancock8 bhancock8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for posting the issue and the PR! I made one suggestion for simplifying the calculation. Then once we clarify the naming of lfa_bis in the unit test, looks great to me!

@codecov
Copy link

codecov bot commented Sep 5, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@77f49b4). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #1447   +/-   ##
=========================================
  Coverage          ?   97.55%           
=========================================
  Files             ?       55           
  Lines             ?     2002           
  Branches          ?      328           
=========================================
  Hits              ?     1953           
  Misses            ?       22           
  Partials          ?       27
Impacted Files Coverage Δ
snorkel/labeling/analysis.py 100% <100%> (ø)

@garaud garaud force-pushed the fix-lf-summary-no-abstain-1446 branch from 75dbd02 to 8b2d3af Compare September 5, 2019 18:53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit, I promise: can we add a 3 to L_wo_abstain and a 4 to the Y used here? This will make sure that any future changes account for non-identical label sets between L and Y, which we handle correctly in this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea :)
I updated the test part.

Damien Garaud added 2 commits September 5, 2019 22:09
…trix

when the L input data didn't have ABSTAIN label. We only slice the confusion matrix
on [1:, 1:] when there is some ABSTAIN label, i.e. -1 values. The full confusion
matrix should be taken into account when you don't have -1 values.

fix snorkel-team#1446
@garaud garaud force-pushed the fix-lf-summary-no-abstain-1446 branch from 8b2d3af to 95ca47f Compare September 5, 2019 20:10
Copy link
Member

@henryre henryre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garaud awesome!! Let's ship it. Feel free to merge whenever you want.

@henryre
Copy link
Member

henryre commented Sep 5, 2019

@garaud we usually leave it to authors to merge, but we're pushing a patch today so I'm going to go ahead and merge for you. Thanks again, this is great!!

@henryre henryre merged commit ff1074a into snorkel-team:master Sep 5, 2019
@garaud
Copy link
Contributor Author

garaud commented Sep 6, 2019

we usually leave it to authors to merge, but we're pushing a patch today so I'm going to go ahead and merge for you.

Don't worry. Thank you for the review.

The weak supervision is quite new for me and snorkel is a great project! We use the weak supervision to label massive 3D point clouds (LIDAR scans in natural environment such as rocks, cliff, vegetation, etc.) and it's really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusion matrix in LFAnalysis.lf_summary supposes you have at least one ABSTAIN label
3 participants