-
Notifications
You must be signed in to change notification settings - Fork 486
Closed
Labels
Description
/kind bug
What steps did you take and what happened:
Run a "high" amount of paralell jobs relative to your cluster size.
What did you expect to happen:
Things to work, but slowly.
What happened:
The metric-collector cron jobs created by katib keeps spawning new jobs, which don't complete before the new ones are created (since the cluster is under pressure).
Proposed solution:
I know that there is a issue to change to a push-based #577 metric collector, but a short-term fix for this is, I think, to change the concurrency-policy of the cron-jobs to have Forbid
instead of the default Allow
. Then at least only a single instance of the metric-collector jobs is initiated at a time.
Environment:
- Katib version: v0.1.2-alpha-156-g4ab3dbd