Skip to content

Unrecognized "cost_for_crash" keyword to sklearn resampling strategies #901

@mahynski

Description

@mahynski

I am trying to use a StratifiedKFold (and also RepeatedStratifiedKFold) as my resampling strategy but both seem to be causing crashes.

Here is a sample script based on the cancer dataset in the documentation:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

import autosklearn
import autosklearn.classification
from sklearn.model_selection import StratifiedKFold

clf = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=30,
    per_run_time_limit=10,
    n_jobs=1,
    ml_memory_limit=2**13,
    seed=5,
    resampling_strategy=StratifiedKFold,
    resampling_strategy_arguments={'n_splits':5, 'shuffle':True, 'random_state':0},
    delete_output_folder_after_terminate=False,
    delete_tmp_folder_after_terminate=False,
    tmp_folder='./tmp/',
    output_folder='./output/'
)
clf.fit(X_train, y_train)

print(clf.sprint_statistics())

The result is:

/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/pyparsing.py:3190: FutureWarning: Possible set intersection at position 3
  self.re = re.compile(self.reString)
auto-sklearn results:
  Dataset name: b1863778bbca963da927ae292545f722
  Metric: accuracy
  Number of target algorithm runs: 29
  Number of successful target algorithm runs: 0
  Number of crashed target algorithm runs: 29
  Number of target algorithms that exceeded the time limit: 0
  Number of target algorithms that exceeded the memory limit: 0

The output indicates that all runs have crashed (changing memory or time allowed had no effect). Inspection of the tmp/ folder logs seems to indicate the issue is that a key called "cost_for_crash" is being passed to the strategy and is not recognized. For example, in my tmp/AutoML(5):b1863778bbca963da927ae292545f722.log file I see something that looks like:

[DEBUG] [2020-07-15 11:04:51,506:AutoMLSMBO(5)::b1863778bbca963da927ae292545f722] Return: Status: <StatusType.CRASHED: 3>, cost: 1.000000, time: 0.023193, additional: {'traceback': 'Traceback (most recent call last):\n  File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/__init__.py", line 29, in fit_predict_try_except_decorator\n    return ta(queue=queue, **kwargs)\n  File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 1236, in eval_cv\n    budget_type=budget_type,\n  File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 179, in __init__\n    self.splitter = self.get_splitter(self.datamanager)\n  File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 951, in get_splitter\n    cv = copy.deepcopy(self.resampling_strategy)(**init_dict)\nTypeError: __init__() got an unexpected keyword argument \'cost_for_crash\'\n', 'error': 'TypeError("__init__() got an unexpected keyword argument \'cost_for_crash\'")', 'configuration_origin': 'Initial design'}
[INFO] [2020-07-15 11:04:51,508:smac.intensification.intensification.Intensifier] Wallclock time limit for intensification reached (used: 0.174835 sec, available: 0.000010 sec)
[INFO] [2020-07-15 11:04:51,508:smac.intensification.intensification.Intensifier] Wallclock time limit for intensification reached (used: 0.174835 sec, available: 0.000010 sec)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions