Possible memory leak

I've been using `TPOTEstimator` to evolve hyperparameters for my ML models, and it appears there's probably a memory leak in the TPOT code.  My WIP code & data are closed-source, so I won't be able to definitively prove that that's the case, but filing a report since this might still be useful information.  

### Related + IMO confusing TPOT code
In a quick skim through the TPOT code, I notice this [code block](https://github.com/EpistasisLab/tpot2/blob/d0c3b938a4953e45d97ef35eb0a2a0ad67993ddb/tpot/evolvers/base_evolver.py#L579), with two near-identically named data members that could easily be confused with each other and cause a memory leak.  Regardless of whether my example is actually a TPOT memory leak, I'd suggest using one `client` variable and a separate `bool` to track whether or not it was user-provided.  These easily-confused variables seem likely to be a source of future bugs, if not the cause of a current one.

----

### My (imperfect) example

I'm using 10 cores and my training data has shape (312_735, 240).  Using TPOT2 0.1.9a0.

I'm attaching some runtime output [2025-01-29 Likely memory leak.txt](https://github.com/user-attachments/files/18594049/2025-01-29.Likely.memory.leak.txt), which is a little noisy, but my script doesn't do much other than use TPOT before failing (in this case). Many other tests with no code edits but different data & TPOT versions have worked.  Some relevant log messages match up with my recipe below:

*  "Done searching Logistic Regression hyperparameter space in 1 h, 1 min, 42.00 s" -- indicates TPOT `fit()` has finished.
* "Cross-validating the Logistic Regression model" -- causes an error immediately after in this case.

### My Code
My WIP code basically does this:

1. Load training data
2. Preprocess it to drop some columns
3. Run hyperparameter optimization using TPOT
    Some worker restart messages don't build confidence, but this step completes.
5. Print the TPOT estimator (see "TPOTEstimator vars" in the attachment)
6. Run cross-validation again    
    I know TPOT is already doing this, just accounts for a different, optional code path where TPOT isn't involved, and in my case isn't too expensive to run again.
6.  Fail when trying to allocate new processes
     I'll look into providing my own Dask client to TPOT to work around this, but I think the potential for a TPOT memory leak is still worth reporting.

The TPOTEstimator portion of the code is:
```
    estimator = TPOTEstimator(
        classification=True,
        cv=cross_validator,
        generations=search_config.generations,
        n_jobs=search_config.n_jobs,
        population_size=search_config.population_size,
        random_state=child_seed,
        early_stop=search_config.early_stop,
        search_space=search_space,
        scorers=["f1"],  
        scorers_weights=[1], 
        scorers_early_stop_tol=search_config.early_stop_tolerance,
        verbose=4,
    )
    estimator.fit(X, y)  # Use genetic algorithm to explore hyperparameter space
    duration = utils.runtime_summary_str(start, datetime.now(UTC))
    print(f"Done searching {model_name} hyperparameter space in {duration}")
    print("TPOTEstimator vars:")
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible memory leak #168

Related + IMO confusing TPOT code

My (imperfect) example

My Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible memory leak #168

Description

Related + IMO confusing TPOT code

My (imperfect) example

My Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions