Add BaselineRankMultiFeature #871

shaycrk · 2021-11-13T23:07:23Z

Building on #870, this PR adds BaselineRankMultiFeature to allow ranking by more than one feature as a baseline model (if we decide on a different name for the sort order parameter there, we'll want to be sure to rename it here as well for consistency)

thcrock · 2021-11-14T03:47:36Z

src/triage/component/catwalk/baselines/rankers.py

+            if not isinstance(rule, dict):
+                raise ValueError('Rules for BaselineRankMultiFeature must be of type dict')
+            if not rule.keys() >= REQUIRED_KEYS:
+                raise ValueError(f'BaselineRankMultiFeature rule "{rule}" missing one or more reuired keys ({REQUIRED_KEYS})')


typo in error message

fixed -- thanks!

thcrock · 2021-11-14T04:01:58Z

src/tests/catwalk_tests/test_baselines.py

+            ranker.fit(x=self.data["X_train"], y=self.data["y_train"])
+            results = ranker.predict_proba(self.data["X_test"])
+            if direction_value:
+                expected_ranks = [6, 1, 3, 0, 5, 2, 4, 5]


I wonder if a clearer way to express this test (and one less prone to rewriting if the test fixture has to be modified for any reason) would be to order by the rank and assert that each one has a lower or equal/higher or equal value than the one prior.

I have mixed thoughts on this approach of duplicating logic in the test from the code under test; after all, if it's been implemented incorrectly in the code under test, who's to say the test doesn't have the same bad logic and thus erroneously pass? But I'm guessing in this case you wouldn't be able to copy and paste code from the code under test, and it probably wouldn't have that problem.

I'm not sure this is a big issue here. In general it's good to go for low maintenance tests that aren't beholden to what the test fixtures specifically look like, especially when the test fixtures are shared between different tests. You could in the future end up adding a new baseline class that needs more entropy than this fixture provides, and the data changes you make could require all these tests to be updated. What is the probability of needing to make such a change in the future? I don't know. But in general this type of thing happens a lot, and it makes the tests seem like a burden so it's good to minimize when we can.

As far as the data goes, I might be misunderstanding the logic used; shouldn't the calls with inverse direction values but the same data have inverse expected ranks?

Thanks, @thcrock! That's a good point -- I just added a helper function to check that the returned scores align to the expected ordering without having to reproduce the logic from the underlying function itself. Let me know if that looks better to you or if you think it could be cleaned up a bit more?

I think the calls with the different direction values do get the inverse expected ranks, but let me know if I'm missing something. For instance:

if direction_value: expected_ranks = [6, 1, 3, 0, 5, 2, 4, 5] else: expected_ranks = [0, 5, 3, 6, 1, 4, 2, 1]

not-the-fish · 2021-11-14T18:12:40Z

No further thoughts beyond what @thcrock has said but wondering if we should just deprecate PercentileRankOneFeature with this since it obviates the need for that ranker.

shaycrk · 2021-11-15T16:05:16Z

@ecsalomon -- I'm fine either keeping or deprecating PercentileRankOneFeature. I agree this should fit most of the use cases for that, but unlike PercentileRankOneFeature, it doesn't attempt to ensure that the scores are interpretable as percentiles, just correctly ordered, so we would be loosing some functionality. I'm not sure how often that feature gets used, though.

not-the-fish · 2021-11-15T16:21:46Z

Yeah, I mean, there was no real motivation for percentile being the method there. A deprecation warning might surface anyone who is actually using them as percentiles downstream somewhere.

Digging in a bit more, I might actually argue for the generic scaler to be the thing that controls how scores from the rankers get scaled (i.e., they return whatever scale is applied to the input features) rather than the rankers applying their own scaling.

shaycrk · 2021-11-15T16:39:39Z

That makes sense about the percentiles, I can certainly add a deprecation warning here and direct people to the newer ranking method.

On the scaler, how would the generic scaler work when you're ranking across multiple features? Even with one feature, if you're learning the scaling from the training set, you'll also run into cases where the test values are outside of [0,1] and we might not want to cap them for the purposes of simple ranking -- of course, the 0 to 1 range is a bit arbitrary here, but if we are scaling to something like that it's probably good to keep them within the range.

not-the-fish · 2021-11-15T16:49:58Z

Yeah, a fair point about not artificially capping test ranks if they fall outside the range. I'm convinced I was wrong. :)

shaycrk · 2021-11-19T17:54:58Z

@thcrock -- just pinging if you have other thoughts on the unit tests here?

Also, @rayidghani -- other thoughts on the descend parameter name? I don't think low_value_high_score is great, but also don't have better ideas.

shaycrk · 2021-12-07T16:28:47Z

Going ahead and merging this now, but we can revisit the parameter name and unit tests in future pull request(s) if we want.

Closes #869

shaycrk added 6 commits November 12, 2021 15:30

changing PercentileRankOneFeature descending parameter

fcb3aa9

update parameter name

a72eb9b

add deprecated descend parameter for backwards compatability

9afd81b

multi feature ranker

44f9cee

multi feature ranker tests

ab142df

debug test

9688a30

shaycrk requested review from not-the-fish, rayidghani and thcrock November 13, 2021 23:07

shaycrk assigned rayidghani, thcrock and not-the-fish Nov 13, 2021

thcrock reviewed Nov 14, 2021

View reviewed changes

shaycrk added 2 commits November 15, 2021 10:12

fix typo

7499bc6

avoid duplicating logic in tests

6fca90d

deprecation warning for PercentileRankOneFeature

47cc3ba

shaycrk merged commit b4ff916 into master Dec 7, 2021

shaycrk deleted the kit_more_rancor branch December 7, 2021 16:29

This was referenced Dec 7, 2021

Renaming PercentileRankOneFeature descend parameter (closes #869) #870

Closed

Renaming "descending" parameter in PercentileRankOneFeature #869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BaselineRankMultiFeature #871

Add BaselineRankMultiFeature #871

Uh oh!

shaycrk commented Nov 13, 2021

Uh oh!

thcrock Nov 14, 2021

Uh oh!

shaycrk Nov 15, 2021

Uh oh!

thcrock Nov 14, 2021

Uh oh!

shaycrk Nov 15, 2021

Uh oh!

not-the-fish commented Nov 14, 2021

Uh oh!

shaycrk commented Nov 15, 2021

Uh oh!

not-the-fish commented Nov 15, 2021

Uh oh!

shaycrk commented Nov 15, 2021

Uh oh!

not-the-fish commented Nov 15, 2021

Uh oh!

shaycrk commented Nov 19, 2021

Uh oh!

shaycrk commented Dec 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add BaselineRankMultiFeature #871

Add BaselineRankMultiFeature #871

Uh oh!

Conversation

shaycrk commented Nov 13, 2021

Uh oh!

thcrock Nov 14, 2021

Choose a reason for hiding this comment

Uh oh!

shaycrk Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

thcrock Nov 14, 2021

Choose a reason for hiding this comment

Uh oh!

shaycrk Nov 15, 2021

Choose a reason for hiding this comment

Uh oh!

not-the-fish commented Nov 14, 2021

Uh oh!

shaycrk commented Nov 15, 2021

Uh oh!

not-the-fish commented Nov 15, 2021

Uh oh!

shaycrk commented Nov 15, 2021

Uh oh!

not-the-fish commented Nov 15, 2021

Uh oh!

shaycrk commented Nov 19, 2021

Uh oh!

shaycrk commented Dec 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants