-
Notifications
You must be signed in to change notification settings - Fork 488
Description
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The experiment controller is not showing any events when fails to reconcile all trials.
For example, consider the situation the trial parameter reference is misconfigured as below. Assume that parameter is given as num-layers
, and if we do not correctly set its reference as num-layer
(typo) in trialParameters
, all trials fail to be created.
parameters:
- feasibleSpace:
...
name: num-layers
parameterType: int
trialTemplate:
...
trialParameters:
- name: numberLayers
reference: num-layer # typo
We can check the reason for the failure in the controller log. However, users not authorized to access the controller can not find the reason that why their trials are not created since no events are emitted by the experiment controller.
$ kubectl describe experiment random-experiment -n user
...
Status:
Completion Time: <nil>
Conditions:
Message: Experiment is created
Reason: ExperimentCreated
Status: True
Type: Created
Current Optimal Trial:
Observation:
Events: <none>
What did you expect to happen:
The experiment controller emits events when fails to reconcile all trials.
Anything else you would like to add:
Relevant logs in Katib controller
Fail to get RunSpec from experiment","Experiment":"user/random-experiment","error":"Unable to find parameter: num-layer in parameter assignment map[lr:0.026271422193467404 num-layers:5 optimizer:sgd
Environment:
- Kubeflow version (
kfctl version
): v1.3 - Kubernetes version: (use
kubectl version
): v1.18.10 - OS (e.g. from
/etc/os-release
): CentOS 7.9
If it's okay, I'd like to contribute to solving the issue