Do not read predictions in memory, only after score #870

franchuterivera · 2020-06-01T11:59:34Z

Code cleanup for _read_np_fn
Fix the precision type when reading files, in _read_np_fn
Score the predictions without reading them. Only read the actual file when we are sure it is the best prediction.

franchuterivera · 2020-06-01T12:00:29Z

I do have one extra open, and is the need of the complex data structure with loaded key.

That is, I don't see this "loaded" key being used at all, in the ensemble structure.

mfeurer

I left some comments throughout the changes. I really like that this PR even reduces the lines of code.

autosklearn/ensemble_builder.py

codecov-commenter · 2020-06-11T12:26:50Z

Codecov Report

Merging #870 into development will increase coverage by 0.02%.
The diff coverage is 83.33%.

@@               Coverage Diff               @@
##           development     #870      +/-   ##
===============================================
+ Coverage        84.02%   84.04%   +0.02%     
===============================================
  Files              127      127              
  Lines             9458     9435      -23     
===============================================
- Hits              7947     7930      -17     
+ Misses            1511     1505       -6

Impacted Files	Coverage Δ
autosklearn/ensemble_builder.py	`71.08% <82.92%> (+1.18%)`	⬆️
autosklearn/automl.py	`81.75% <100.00%> (+0.06%)`	⬆️
...mponents/feature_preprocessing/nystroem_sampler.py	`85.29% <0.00%> (-5.89%)`	⬇️
..._preprocessing/select_percentile_classification.py	`86.20% <0.00%> (-3.45%)`	⬇️
.../metalearning/metalearning/kNearestDatasets/kND.py	`94.11% <0.00%> (-0.62%)`	⬇️
autosklearn/estimators.py	`90.36% <0.00%> (ø)`
...eline/components/feature_preprocessing/fast_ica.py	`91.30% <0.00%> (+2.17%)`	⬆️
...e/components/feature_preprocessing/select_rates.py	`84.61% <0.00%> (+3.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c695989...1187a11. Read the comment docs.

* Do not read predictions in memory, only after score * Precission support for string/int

* PEP8 (#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * #782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * #782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (#854) * More robust tmp file naming * UUID approach * 771 worst possible result (#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (#865) * #715 Support for no ml memory limit * API update * Docs enhancement (#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (#866) * Do not read predictions in memory, only after score (#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

* Do not read predictions in memory, only after score * Precission support for string/int

* PEP8 (automl#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

Do not read predictions in memory, only after score

9813615

mfeurer reviewed Jun 9, 2020

View reviewed changes

autosklearn/ensemble_builder.py Outdated Show resolved Hide resolved

autosklearn/ensemble_builder.py Outdated Show resolved Hide resolved

autosklearn/ensemble_builder.py Outdated Show resolved Hide resolved

Precission support for string/int

1187a11

mfeurer approved these changes Jun 13, 2020

View reviewed changes

mfeurer merged commit d313f26 into automl:development Jun 13, 2020

charlesfu4 pushed a commit to charlesfu4/auto-sklearn that referenced this pull request Jun 17, 2020

Do not read predictions in memory, only after score (automl#870)

92a860b

* Do not read predictions in memory, only after score * Precission support for string/int

franchuterivera added a commit to franchuterivera/auto-sklearn that referenced this pull request Aug 21, 2020

Do not read predictions in memory, only after score (automl#870)

d5efed2

* Do not read predictions in memory, only after score * Precission support for string/int

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not read predictions in memory, only after score #870

Do not read predictions in memory, only after score #870

Uh oh!

franchuterivera commented Jun 1, 2020

Uh oh!

franchuterivera commented Jun 1, 2020

Uh oh!

mfeurer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 11, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Do not read predictions in memory, only after score #870

Do not read predictions in memory, only after score #870

Uh oh!

Conversation

franchuterivera commented Jun 1, 2020

Uh oh!

franchuterivera commented Jun 1, 2020

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 11, 2020 •

edited

Loading