Skip to content

Why are you doing the K-means clustering? #32

@AakashKeswani

Description

@AakashKeswani

Hi,

First I'd like to say thanks for publishing this repo! It's very helpful.

My question specifically refers to this description in the README:

_After the specified Optuna trials are complete, a 3-step KMeans clustering method is used to select the optimal parameter(s):

Each trial is placed in its nearest neighbor cluster based on its distance correlation to the target. The optimal number of clusters is determined using the elbow method. The cluster with the highest average correlation is selected with respect to its membership. In other words, a weighted score is used to select the cluster with the highest correlation but also with the most trials.
After the best correlation cluster is selected, the parameters of the trials within the cluster are also clustered. Again, the best cluster of indicator parameter(s) is selected with respect to its membership.
Finally, the centered best trial is selected from the best parameter cluster._

Since you are clustering by the correlation, and then picking the cluster with the best mean-correlation to the target, I'm not really sure what this is achieving. Why not just use the parameters from the trial with the highest correlation itself?

I can see how this would be useful if you were clustering by the parameters instead of the correlations. (That way you avoid outlier/overfit parameters by making sure you're using a cluster with similar parameters having a high correlation). But the description and the implementation don't seem to be actually using the parameter values in the clustering, they only cluster the scores.

Alternatively doing a k-fold optimization could help control for overfitting as well. Although I guess the user can implement that themselves if they want to.

Thanks again!
-Aakash

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions