Skip to content

Trials are unrecoverable forever when randomly removes them #1830

@henrysecond1

Description

@henrysecond1

/kind bug

What steps did you take and what happened:

Created any kind of experiment and run it.
When I randomly removed the trial created by katib controller, it was unrecoverable forever.

Since the experiment controller reconciles trial assignments by only comparing the number of running trials and desired trials, removed trials can not be recovered.

What did you expect to happen:

I expect trials should be recovered and completed even if I randomly remove them.

Anything else you would like to add:

Uploaded #1831 to fix the problem. Please take a look, thank you.

Environment:

  • Katib version (check the Katib controller image version): v0.12
  • Kubernetes version: (kubectl version): v1.19.9
  • OS (uname -a): 3.10.0-1160.31.1.el7.x86_64

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions