Skip to content

Possible memory leak #168

@chimaerase

Description

@chimaerase

I've been using TPOTEstimator to evolve hyperparameters for my ML models, and it appears there's probably a memory leak in the TPOT code. My WIP code & data are closed-source, so I won't be able to definitively prove that that's the case, but filing a report since this might still be useful information.

Related + IMO confusing TPOT code

In a quick skim through the TPOT code, I notice this code block, with two near-identically named data members that could easily be confused with each other and cause a memory leak. Regardless of whether my example is actually a TPOT memory leak, I'd suggest using one client variable and a separate bool to track whether or not it was user-provided. These easily-confused variables seem likely to be a source of future bugs, if not the cause of a current one.


My (imperfect) example

I'm using 10 cores and my training data has shape (312_735, 240). Using TPOT2 0.1.9a0.

I'm attaching some runtime output 2025-01-29 Likely memory leak.txt, which is a little noisy, but my script doesn't do much other than use TPOT before failing (in this case). Many other tests with no code edits but different data & TPOT versions have worked. Some relevant log messages match up with my recipe below:

  • "Done searching Logistic Regression hyperparameter space in 1 h, 1 min, 42.00 s" -- indicates TPOT fit() has finished.
  • "Cross-validating the Logistic Regression model" -- causes an error immediately after in this case.

My Code

My WIP code basically does this:

  1. Load training data
  2. Preprocess it to drop some columns
  3. Run hyperparameter optimization using TPOT
    Some worker restart messages don't build confidence, but this step completes.
  4. Print the TPOT estimator (see "TPOTEstimator vars" in the attachment)
  5. Run cross-validation again
    I know TPOT is already doing this, just accounts for a different, optional code path where TPOT isn't involved, and in my case isn't too expensive to run again.
  6. Fail when trying to allocate new processes
    I'll look into providing my own Dask client to TPOT to work around this, but I think the potential for a TPOT memory leak is still worth reporting.

The TPOTEstimator portion of the code is:

    estimator = TPOTEstimator(
        classification=True,
        cv=cross_validator,
        generations=search_config.generations,
        n_jobs=search_config.n_jobs,
        population_size=search_config.population_size,
        random_state=child_seed,
        early_stop=search_config.early_stop,
        search_space=search_space,
        scorers=["f1"],  
        scorers_weights=[1], 
        scorers_early_stop_tol=search_config.early_stop_tolerance,
        verbose=4,
    )
    estimator.fit(X, y)  # Use genetic algorithm to explore hyperparameter space
    duration = utils.runtime_summary_str(start, datetime.now(UTC))
    print(f"Done searching {model_name} hyperparameter space in {duration}")
    print("TPOTEstimator vars:")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions