Skip to content

Error for CatBoost: features data: pandas.DataFrame column 'store_type_pdist' has dtype 'category' but is not in cat_features list #383

@xiaobo

Description

@xiaobo

I have a dataset X_train, features data: pandas.DataFrame column 'store_type_pdist' has dtype 'category', with numerical values like 0, 1, 2;

when running code like:

automl = AutoML(mode="Perform")
automl.fit(X_train, y_train); 

it get the following error,please help to resolve, thanks

## Error for 3_Default_CatBoost

features data: pandas.DataFrame column 'store_type_pdist' has dtype 'category' but is not in  cat_features list
Traceback (most recent call last):
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/base_automl.py", line 1074, in _fit
    trained = self.train_model(params)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/base_automl.py", line 363, in train_model
    self.keep_model(mf, model_subpath)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/base_automl.py", line 262, in keep_model
    self._base_predict(self._one_sample, model)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/base_automl.py", line 1265, in _base_predict
    predictions = model.predict(X)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/model_framework.py", line 387, in predict
    y_p = learner.predict(X_data)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/supervised/algorithms/catboost.py", line 275, in predict
    return self.model.predict(X, ntree_end=self.best_ntree_limit)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/catboost/core.py", line 4894, in predict
    return self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose, 'predict')
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/catboost/core.py", line 1978, in _predict
    data, data_is_single_object = self._process_predict_input_data(data, parent_method_name, thread_count)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/catboost/core.py", line 1958, in _process_predict_input_data
    data = Pool(
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/catboost/core.py", line 455, in __init__
    self._init(data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
  File "/home/user/workspace/aitiaexplorer/uplift/automl/lib/python3.8/site-packages/catboost/core.py", line 966, in _init
    self._init_pool(data, label, cat_features, text_features, embedding_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
  File "_catboost.pyx", line 3550, in _catboost._PoolBase._init_pool
  File "_catboost.pyx", line 3597, in _catboost._PoolBase._init_pool
  File "_catboost.pyx", line 3438, in _catboost._PoolBase._init_features_order_layout_pool
  File "_catboost.pyx", line 2433, in _catboost._set_features_order_data_pd_data_frame
_catboost.CatBoostError: features data: pandas.DataFrame column 'store_type_pdist' has dtype 'category' but is not in  cat_features list

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

software version:
Package Version


alembic 1.5.8
attrs 20.3.0
backcall 0.2.0
catboost 0.24.4
category-encoders 2.2.2
cliff 3.7.0
cloudpickle 1.3.0
cmaes 0.8.2
cmd2 1.5.0
colorama 0.4.4
colorlog 5.0.1
colour 0.1.5
cycler 0.10.0
decorator 5.0.7
dill 0.3.3
dtreeviz 1.0
graphviz 0.16
greenlet 1.0.0
iniconfig 1.1.1
ipykernel 5.5.3
ipython 7.22.0
ipython-genutils 0.2.0
jedi 0.18.0
joblib 1.0.1
jupyter-client 6.2.0
jupyter-core 4.7.1
kiwisolver 1.3.1
lightgbm 3.0.0
llvmlite 0.36.0
Mako 1.1.4
MarkupSafe 1.1.1
matplotlib 3.4.1
mljar-supervised 0.10.3
nest-asyncio 1.5.1
numba 0.53.1
numpy 1.19.5
optuna 2.6.0
packaging 20.9
pandas 1.2.0
parso 0.8.2
patsy 0.5.1
pbr 5.5.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.2.0
pip 21.0.1
plotly 4.14.3
pluggy 0.13.1
prettytable 2.1.0
prompt-toolkit 3.0.18
ptyprocess 0.7.0
py 1.10.0
pyarrow 3.0.0
pyfunctional 1.4.3
Pygments 2.8.1
pyparsing 2.4.7
pyperclip 1.8.2
pytest 6.2.3
python-dateutil 2.8.1
python-editor 1.0.4
pytz 2021.1
PyYAML 5.4.1
pyzmq 22.0.3
retrying 1.3.3
scikit-learn 0.24.1
scipy 1.6.1
seaborn 0.10.1
setuptools 47.1.0
shap 0.36.0
six 1.15.0
slicer 0.0.7
SQLAlchemy 1.4.11
statsmodels 0.12.2
stevedore 3.3.0
tabulate 0.8.7
threadpoolctl 2.1.0
toml 0.10.2
tornado 6.1
tqdm 4.60.0
traitlets 5.0.5
wcwidth 0.2.5
wordcloud 1.8.1
xgboost 1.3.3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions