Skip to content

Duplicate hyperparameters waste compute and time #2571

@Antsypc

Description

@Antsypc

What happened?

I am experiencing a recurring issue where the hyperparameter tuning process generates duplicate sets of parameters, leading to inefficient use of GPU resources.

For instance, with the experimental setup below:

spec:
  algorithm:
    algorithmName: bayesianoptimization
  maxTrialCount: 10
  metricsCollectorSpec:
    collector:
      kind: StdOut
  objective:
    goal: 1
    metricStrategies:
      - name: Accuracy
        value: max
    objectiveMetricName: Accuracy
    type: maximize
  parallelTrialCount: 1
  parameters:
    - feasibleSpace:
        list:
          - "0.01"
          - "1"
          - "5"
          - "10"
          - "0.1"
      name: C
      parameterType: categorical
    - feasibleSpace:
        list:
          - linear
          - rbf
          - poly
          - sigmoid
      name: kernal
      parameterType: categorical
    - feasibleSpace:
        list:
          - "0.1"
          - "0.001"
          - "0.01"
      name: gamma
      parameterType: categorical

The suggestions provided by the algorithm are often redundant. The following output illustrates this problem, showing only the duplicated suggestions for clarity:

spec:
  algorithm:
    algorithmName: bayesianoptimization
  requests: 10
  resumePolicy: Never
status:
  suggestionCount: 10
  suggestions:
    - name: mnist-pytorch-rep-bo-1-v1-g2qml7vd
      parameterAssignments:
        - name: C
          value: "10"
        - name: kernal
          value: poly
        - name: gamma
          value: "0.1"
    - name: mnist-pytorch-rep-bo-1-v1-hjzn82gn
      parameterAssignments:
        - name: C
          value: "10"
        - name: kernal
          value: poly
        - name: gamma
          value: "0.1"

I have run this experiment repeatedly, and in extreme cases, all 10 suggestions are identical.

What did you expect to happen?

  • Filter out duplicated hyperparameters.
  • Early stop once the hyperparameter search space is exhausted.

Environment

  • Kubernetes version: v1.25.14
  • Katib controller version: kubeflow/katib-controller:v0.16.0

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions