-
Couldn't load subscription status.
- Fork 425
Closed
Description
PermutationImportance was enhanced in #208 to limit excessive computation when the number of columns is large:
rows, cols = X_validation.shape
if cols > 5000:
X_vald, _, y_vald, _ = subsample(
X_validation, y_validation, train_size=100, ml_task=ml_task
)
elif cols > 50 and rows * cols > 200000:
X_vald, _, y_vald, _ = subsample(
X_validation, y_validation, train_size=1000, ml_task=ml_task
)
else:
X_vald = X_validation
y_vald = y_validationOriginally posted by @pplonski in #208 (comment)
If a dataset has fewer rows than these hardwired train_size values, subsample throws an exception and PermutationImportance fails.
An obvious fix is to replace these with train_size=min(nRows, constant).
Wide and short datasets are quite common in biological applications, and feature importance is one of the most valuable outcomes of an analysis.
Thanks very much!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working