-
Notifications
You must be signed in to change notification settings - Fork 486
Open
Description
What happened?
I am experiencing a recurring issue where the hyperparameter tuning process generates duplicate sets of parameters, leading to inefficient use of GPU resources.
For instance, with the experimental setup below:
spec:
algorithm:
algorithmName: bayesianoptimization
maxTrialCount: 10
metricsCollectorSpec:
collector:
kind: StdOut
objective:
goal: 1
metricStrategies:
- name: Accuracy
value: max
objectiveMetricName: Accuracy
type: maximize
parallelTrialCount: 1
parameters:
- feasibleSpace:
list:
- "0.01"
- "1"
- "5"
- "10"
- "0.1"
name: C
parameterType: categorical
- feasibleSpace:
list:
- linear
- rbf
- poly
- sigmoid
name: kernal
parameterType: categorical
- feasibleSpace:
list:
- "0.1"
- "0.001"
- "0.01"
name: gamma
parameterType: categorical
The suggestions provided by the algorithm are often redundant. The following output illustrates this problem, showing only the duplicated suggestions for clarity:
spec:
algorithm:
algorithmName: bayesianoptimization
requests: 10
resumePolicy: Never
status:
suggestionCount: 10
suggestions:
- name: mnist-pytorch-rep-bo-1-v1-g2qml7vd
parameterAssignments:
- name: C
value: "10"
- name: kernal
value: poly
- name: gamma
value: "0.1"
- name: mnist-pytorch-rep-bo-1-v1-hjzn82gn
parameterAssignments:
- name: C
value: "10"
- name: kernal
value: poly
- name: gamma
value: "0.1"
I have run this experiment repeatedly, and in extreme cases, all 10 suggestions are identical.
What did you expect to happen?
- Filter out duplicated hyperparameters.
- Early stop once the hyperparameter search space is exhausted.
Environment
- Kubernetes version: v1.25.14
- Katib controller version: kubeflow/katib-controller:v0.16.0
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍