-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
I am trying to use a StratifiedKFold (and also RepeatedStratifiedKFold) as my resampling strategy but both seem to be causing crashes.
Here is a sample script based on the cancer dataset in the documentation:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
import autosklearn
import autosklearn.classification
from sklearn.model_selection import StratifiedKFold
clf = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=30,
per_run_time_limit=10,
n_jobs=1,
ml_memory_limit=2**13,
seed=5,
resampling_strategy=StratifiedKFold,
resampling_strategy_arguments={'n_splits':5, 'shuffle':True, 'random_state':0},
delete_output_folder_after_terminate=False,
delete_tmp_folder_after_terminate=False,
tmp_folder='./tmp/',
output_folder='./output/'
)
clf.fit(X_train, y_train)
print(clf.sprint_statistics())
The result is:
/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/pyparsing.py:3190: FutureWarning: Possible set intersection at position 3
self.re = re.compile(self.reString)
auto-sklearn results:
Dataset name: b1863778bbca963da927ae292545f722
Metric: accuracy
Number of target algorithm runs: 29
Number of successful target algorithm runs: 0
Number of crashed target algorithm runs: 29
Number of target algorithms that exceeded the time limit: 0
Number of target algorithms that exceeded the memory limit: 0
The output indicates that all runs have crashed (changing memory or time allowed had no effect). Inspection of the tmp/ folder logs seems to indicate the issue is that a key called "cost_for_crash" is being passed to the strategy and is not recognized. For example, in my tmp/AutoML(5):b1863778bbca963da927ae292545f722.log file I see something that looks like:
[DEBUG] [2020-07-15 11:04:51,506:AutoMLSMBO(5)::b1863778bbca963da927ae292545f722] Return: Status: <StatusType.CRASHED: 3>, cost: 1.000000, time: 0.023193, additional: {'traceback': 'Traceback (most recent call last):\n File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/__init__.py", line 29, in fit_predict_try_except_decorator\n return ta(queue=queue, **kwargs)\n File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 1236, in eval_cv\n budget_type=budget_type,\n File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 179, in __init__\n self.splitter = self.get_splitter(self.datamanager)\n File "/home/nam4/local/anaconda2/envs/automl/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 951, in get_splitter\n cv = copy.deepcopy(self.resampling_strategy)(**init_dict)\nTypeError: __init__() got an unexpected keyword argument \'cost_for_crash\'\n', 'error': 'TypeError("__init__() got an unexpected keyword argument \'cost_for_crash\'")', 'configuration_origin': 'Initial design'}
[INFO] [2020-07-15 11:04:51,508:smac.intensification.intensification.Intensifier] Wallclock time limit for intensification reached (used: 0.174835 sec, available: 0.000010 sec)
[INFO] [2020-07-15 11:04:51,508:smac.intensification.intensification.Intensifier] Wallclock time limit for intensification reached (used: 0.174835 sec, available: 0.000010 sec)
Metadata
Metadata
Assignees
Labels
No labels