Skip to content

Scipy sparse matrices not handled correctly by TPOT and autosklearn #370

@sebhrusen

Description

@sebhrusen

Failing datasets:
https://openml.org/t/360932
https://openml.org/t/360932

  • serialization of sparse matrices was not applied correctly.
  • once fixed, the frameworks still fail with the following errors:
# TPOT
  File "/Users/seb/repos/ml/automlbenchmark/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 1359, in _check_dataset
    self.config_dict
ValueError: Not all operators in None supports sparse matrix. Please use "TPOT sparse" for sparse matrix.
#autosklearn
  File "/Users/seb/repos/ml/automlbenchmark/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/utils/multiclass.py", line 288, in type_of_target
    if y.ndim > 2 or (y.dtype == object and len(y) and
TypeError: len() of unsized object

We'll improve support for sparse data in a future version: for now, we can simply deserialize the sparse matrices as dense matrices for the frameworks that don't use pandas.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions