Skip to content

Conversation

mfeurer
Copy link
Contributor

@mfeurer mfeurer commented May 7, 2020

No description provided.

gui-miotto and others added 30 commits September 17, 2019 15:32
* working version of the nested pipeline

* first moves on the direction of a column transformer autosklearn pipeline

* a working pipeline

* working and tested pipeline

* automl in progress

* mod gitignore

* more work on automl

* more work on automl

* more work on automl

* more work on automl

* more work on automl

* automl seems to be working

* Removed some unnecessary testing files

* Added some docstrings

* merged CategoryShift with CategoricalImputation to get a cleaner solution on the categorical data preprocessing pipeline

* doc string corrections

* fixed some unittests

* Unmerged category shift and categorical imputation

* Implemented some of Matthias comments

* corrected some unit tests

* added a CategoryShift implementation

* Added an OHE implementation for sparse datasets. Fixed a couple of  OHE tests

* Code for the minority coalescer choice

* OHE now returns only sparse matrices (keeping the original behavior)

* Corrected some OHE unit tests

* Use the new preprocessing pipeline inside the SimpleRegressionPipeline

* fixed some unit tests

* OHE unit test adjustments

* readded dataset.pkl

* makes sure the input of the feature_type_splitter is dense

* Modifications on the FeatureTypeSplitter code due to a sklearn's ColumnTransformer bug (see sklearn issue #15627)

* Added tests for the SparseOneHotEncoder

* added tests for the CategoryShift implementation

* Added tests for the MinorityCoalescer implementation

* Added tests for CategoricalImputation

* small test adjustments

* category_shift.transform(X) now works on a copy of X

* fixed unittest

* metalearning test fixed

* metalearning test fixed

* updated all metalearning configuration.csv tables

* use of more convinient names

* cleaned last dependencies on the old 1HE

* renaming

* small fixes on test_metalearning_features

* removed the utils.datapreprocessing and corrected some unit tests

* PEP8

* OneHotEncoder now uses handle_unknown='ignore'

* PEP8

* PEP8

* added some new unit tests

* PEP8

* added missing __init__ file

* PEP8

* added tests for data_preprocessing_numerical

* added unit tests for data_preprocessing.py

* added unit tests for data_preprocessing

* PEP8

* corrected fit and transform behavior in the MinorityCoalescer implementation

* removed method fit_transformer from NumericalPreprocessingPipeline and CategoricalPreprocessingPipeline

* minor modifications suggested on @mfeurer's PR review

* minor modifications suggested by @mfeurer in his PR review

* small code simplification in DataPreprocessor

* more modifications suggested by @mfeurer in his PR review

* more modifications suggested by @mfeurer in his PR review

* PEP8 fixes

* Improvemnt on PreprocessingPipelineTest

* PEP8 fixes

* making sure new components return the correct data type

* fix unit test test_pca_95percent
…#739 (#762)

* PEP8 (#718)

* remove warning "No models better than random"

* warning when other models not better than dummy

* Add model number details in warning

Co-authored-by: Matthias Feurer <[email protected]>
* initial commit

* fix some unittests

* Edit configurations.csv to reflect the new value of the hyperpar max_bins in the GradientBostingClassifier. The by-the-book approach would be to generate new metalearning data, but the impact of not doing so, in this case, should be very small

* fixed some more unittests

* fixed some more unit tests

* changed assert values of the unit test of the extra_tree_regression feature preprocessing

* small variable renaming

* trying to fix the CI issue that makes all tests be executed before the examples

* Revert "trying to fix the CI issue that makes all tests be executed before the examples"

This reverts commit 0bb7f2a.

* try again to fix the CI issue

* try again to fix the CI issue

* trying to fix CI issue

* corrects typo
It no longer exists on 3.8, and has been deprecated
since 3.3.

See e.g. https://docs.python.org/3.5/library/time.html#time.clock
* initial commit

* fix some unittests

* Edit configurations.csv to reflect the new value of the hyperpar max_bins in the GradientBostingClassifier. The by-the-book approach would be to generate new metalearning data, but the impact of not doing so, in this case, should be very small

* fixed some more unittests

* fixed some more unit tests

* changed assert values of the unit test of the extra_tree_regression feature preprocessing

* small variable renaming

* trying to fix the CI issue that makes all tests be executed before the examples

* Revert "trying to fix the CI issue that makes all tests be executed before the examples"

This reverts commit 0bb7f2a.

* try again to fix the CI issue

* try again to fix the CI issue

* trying to fix CI issue

* corrects typo

* changes the deprecated .ix by .loc
* replace nosetests by pytest

* actually use pytest

* change coverage arguments

* debug output

* debug

* debug

* debug

* debug
* bump SMAC version

* fix that example

* reduce the number of warnings
* ADD iteratove fit for gradient boosting

* increase default n_iter to 2

* replace max_iter with 512 everywhere

* remove restriction on max_iter

* fix
…785)

* fixing huge file size when saving model: Issue #421

* PEP8 and some variable renaming

* Bug fixes in the file removal logic

* fixes some unit tests

* Improvements on the model removal logic

* adds flag to activate/deactivate deletion of non-winning model

* unit test fixes

* fix typo

* improved new unit test; improved (again) model deletion logic

* PEP8

* small bug fix on getting the correct ensemble models

* PEP8

* Fix example_parallel_manual_spawning

* PEP8

* delete model files just after training

* delete after predict

* unit test adjustments

* unit test adjustments

* PEP8

* implements some of @mfeurer PR review comments

* Adds assert to verify that at least one model file has been deleted

Co-authored-by: ml-mhs <As1596309384290136>
* intermediate commit

* add budget + subsample successive halving

* fix bug for holdout-iterative-fit

* dump progress

* update budget evaluator more

* many new unit tests

* add example for SH

* fix cv example

* ADD get_max_iter for all iterative estimators (#798)

* ADD get_max_iter for all iterative estimators

Also, make naive bayes estimators no longer iterative

* Fix unittests

* FIX unittest

* FIX unittest

* FIX unittest

* Make SGD/PassiveAggressive use 1024 steps

* Add get max iter (#797)

* ADD get_max_iter for all iterative estimators

Also, make naive bayes estimators no longer iterative

* Fix unittests

* FIX unittest

* FIX unittest

* FIX unittest

* Make SGD/PassiveAggressive use 1024 steps

* combine evaluator with budget retrieval

* update

* update docstring, revert unnecessary changes

* fix test

* fix bug in test evaluator

Co-authored-by: Katharina Eggensperger <[email protected]>
* Fix a bug where re-using a dataset name from the meta-data would crash Auto-sklearn

* add log message

* PEP8
* add budget to output file names

* fix warning which debugging incredibly hard

* fix bug in ensemble builder

* do not remove the dummy predictor

* add budgets to refit

* update PR/self-review

* PEP8

* PEP8

* fix sorting issue in ensemble file reading
* rename ensemble_nbest

* check whether max_keep_best is float or integer

* minor fixes

* minor

* ADD unittest

* ADD threshold on performance range; rename variable

* flake8

* flake8

* skip tests not working for python 3.5

* fix unittests

* Now correctly skip unittests for Python 3.5

* consider mfeurers comments

* fix

* Update ensemble_builder.py

* flake8

Co-authored-by: Matthias Feurer <[email protected]>
* add new status type converged

* pep8

* pep8 and example
mfeurer and others added 24 commits March 31, 2020 21:12
* add iterative-fit-cv

* pep8

* pep8

* update example
* fix passive aggressive iterative fit

* pep8
* #700: New sklearn.metrics.balanced_accuracy_score

* Removing the missing pac score

Co-authored-by: chico <[email protected]>
…ons (#807)

* Add deletion of model files

* Add deletion of test and validation files

* remove test that verifies the deletion validation files

* PEP8

* Reverses logic: loop through the directory instead of list of candidates

* PEP8

* Correct error message

* data structure changes

* Improve readability

* rewrite AbstractEvaluator.file_output() without changing its functionality

* implement locks on AbstractEvaluator.file_output()

* bug fix

* simplify lock naming

* Adapt unittest

* fix AbstractEvaluatorTest

* Add some nosetest byproducts to .gitignore

* Fix FunctionsTest (test_train_evaluator.py)

* Fix a couple of unit tests from TestTrainEvaluator

* PEP8

* delete unnecessary line of code
* Add deletion of model files

* Add deletion of test and validation files

* remove test that verifies the deletion validation files

* PEP8

* Reverses logic: loop through the directory instead of list of candidates

* PEP8

* Correct error message

* data structure changes

* Improve readability

* rewrite AbstractEvaluator.file_output() without changing its functionality

* implement locks on AbstractEvaluator.file_output()

* bug fix

* simplify lock naming

* Adapt unittest

* fix AbstractEvaluatorTest

* Add some nosetest byproducts to .gitignore

* Fix FunctionsTest (test_train_evaluator.py)

* Fix a couple of unit tests from TestTrainEvaluator

* PEP8

* delete unnecessary line of code

* catch an invalid setting

* fix unit tests

Co-authored-by: Gui Miotto <[email protected]>
* Grant PEP8 compliance for util modules

* Remove __init__.py imports

* PEP8
* update meta-data

* add missing test files

* fix two more unit tests
* Make autosklearn/data PEP8 compliant

* Make autosklearn/metrics PEP8 compliant

* PEP8
* Read gzip; return preds; sort files

* FIX valid/test being ordered differently

* flake8

* consider comments, fix unittests

* don't throw an error, but sleep and continue

* flake8

* FIX unittest: ensemble files start from 1 (and not from 0)

* fix
bug was introduce when test was not updated for commit
e12182f which introduced
polynomial feature expansion for sparse data
* Make autosklearn/data PEP8 compliant

* Make autosklearn/metrics PEP8 compliant

* Make autosklearn/metalearning PEP8 compliant

* PEP8
* make components/feature_preprocessing PEP8 compliant

* make components/data_preprocessing PEP8 compliant

* make components/classification PEP8 compliant

* make components/regression PEP8 compliant

* make pipeline/* PEP8 compliant

* implement PR review requested changes
* Make tests PEP8 compliant

* bug fix

* Implement PR review comments

* Implement PR review sugestions

* fix small bug
* Fix race condition in ensemble builder

Due to a recent change, the ensemble builder can load
gzipped files with the ending .npy.gz. The glob statement
to find these files is `.npy*`, which can also find
lock files like `.npy.lock`, which must not be confused
with prediction files.

* Update ensemble_builder.py
* Make examples PEP8 compliant

* Make examples autosklearn/ (level 0) PEP8 compliant

* Make  autosklearn/evaluation PEP8 compliant

* Make autosklearn/ensembles PEP8 compliant

* Add some noqa justifications

* Remove flake8_diff.sh

* Remove two unnecessary lines
* Python 3.8 Warnings cleanup

* Flake 8 bug fixing

* Fixing flake erors

Co-authored-by: chico <[email protected]>
* First version of models in disc

* Added test for max models in disc

* Origin/py 3 8 warns rebase changes (#834)

* Python 3.8 Warnings cleanup

* Flake 8 bug fixing

* Fixing flake erors

Co-authored-by: chico <[email protected]>

* update PR

* fix replace-all error

* fix unit test

* change sorting and add hidden argument

* update unit tests

Co-authored-by: chico <[email protected]>
Co-authored-by: Matthias Feurer <[email protected]>
* First version of 070 release notes

* Missed a bugfix

* Vim added unexpected space -- fix
@mfeurer mfeurer merged commit bb8396b into master May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants