-
Notifications
You must be signed in to change notification settings - Fork 31
Description
I've been using TPOTEstimator
to evolve hyperparameters for my ML models, and it appears there's probably a memory leak in the TPOT code. My WIP code & data are closed-source, so I won't be able to definitively prove that that's the case, but filing a report since this might still be useful information.
Related + IMO confusing TPOT code
In a quick skim through the TPOT code, I notice this code block, with two near-identically named data members that could easily be confused with each other and cause a memory leak. Regardless of whether my example is actually a TPOT memory leak, I'd suggest using one client
variable and a separate bool
to track whether or not it was user-provided. These easily-confused variables seem likely to be a source of future bugs, if not the cause of a current one.
My (imperfect) example
I'm using 10 cores and my training data has shape (312_735, 240). Using TPOT2 0.1.9a0.
I'm attaching some runtime output 2025-01-29 Likely memory leak.txt, which is a little noisy, but my script doesn't do much other than use TPOT before failing (in this case). Many other tests with no code edits but different data & TPOT versions have worked. Some relevant log messages match up with my recipe below:
- "Done searching Logistic Regression hyperparameter space in 1 h, 1 min, 42.00 s" -- indicates TPOT
fit()
has finished. - "Cross-validating the Logistic Regression model" -- causes an error immediately after in this case.
My Code
My WIP code basically does this:
- Load training data
- Preprocess it to drop some columns
- Run hyperparameter optimization using TPOT
Some worker restart messages don't build confidence, but this step completes. - Print the TPOT estimator (see "TPOTEstimator vars" in the attachment)
- Run cross-validation again
I know TPOT is already doing this, just accounts for a different, optional code path where TPOT isn't involved, and in my case isn't too expensive to run again. - Fail when trying to allocate new processes
I'll look into providing my own Dask client to TPOT to work around this, but I think the potential for a TPOT memory leak is still worth reporting.
The TPOTEstimator portion of the code is:
estimator = TPOTEstimator(
classification=True,
cv=cross_validator,
generations=search_config.generations,
n_jobs=search_config.n_jobs,
population_size=search_config.population_size,
random_state=child_seed,
early_stop=search_config.early_stop,
search_space=search_space,
scorers=["f1"],
scorers_weights=[1],
scorers_early_stop_tol=search_config.early_stop_tolerance,
verbose=4,
)
estimator.fit(X, y) # Use genetic algorithm to explore hyperparameter space
duration = utils.runtime_summary_str(start, datetime.now(UTC))
print(f"Done searching {model_name} hyperparameter space in {duration}")
print("TPOTEstimator vars:")