Skip to content

Metric-collector cronjob spawns unlimited jobs #659

@epa095

Description

@epa095

/kind bug

What steps did you take and what happened:
Run a "high" amount of paralell jobs relative to your cluster size.

What did you expect to happen:
Things to work, but slowly.

What happened:
The metric-collector cron jobs created by katib keeps spawning new jobs, which don't complete before the new ones are created (since the cluster is under pressure).

Proposed solution:
I know that there is a issue to change to a push-based #577 metric collector, but a short-term fix for this is, I think, to change the concurrency-policy of the cron-jobs to have Forbid instead of the default Allow. Then at least only a single instance of the metric-collector jobs is initiated at a time.

Environment:

  • Katib version: v0.1.2-alpha-156-g4ab3dbd

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions