-
Notifications
You must be signed in to change notification settings - Fork 485
Closed
Labels
Description
/kind bug
What steps did you take and what happened:
Created a Katib experiment, using grid
search and two int
parameters. CRD looks like this:
apiVersion: kubeflow.org/v1alpha3
kind: Experiment
metadata:
labels:
controller-tools.k8s.io: '1.0'
name: katib-simple-trial
spec:
algorithm:
algorithmName: grid
parallelTrialCount: 1
maxFailedTrialCount: 6
maxTrialCount: 12
objective:
additionalMetricNames:
goal: 100
objectiveMetricName: result
type: maximize
parameters:
- feasibleSpace:
max: '50'
min: '1'
step: '10'
name: a
parameterType: int
- feasibleSpace:
max: '1'
min: '50'
step: '9'
name: b
parameterType: int
trialTemplate:
goTemplate:
rawTemplate: |
apiVersion: batch/v1
kind: Job
metadata:
name: {{.Trial}}
namespace: {{.NameSpace}}
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
restartPolicy: Never
containers:
- name: {{.Trial}}
image: <myimage>
command:
- python3 -u -c "<some_command>"
Since parameter b
has min=50
and max=1
, I would expect the submission of the CRD to fail.
What did you expect to happen:
What happens is that the suggestions pod is created and it starts to continuously produce the following error:
ERROR:grpc._server:Exception calling application: Low must be lower than high
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior
response_or_iterator = behavior(argument, context)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/chocolate_service.py", line 39, in GetSuggestions
search_space, trials, request.request_number)
File "/usr/src/app/github.com/kubeflow/katib/pkg/suggestion/v1alpha3/chocolate/base_chocolate_service.py", line 33, in getSuggestions
int(param.min), int(param.max), 1)
File "/usr/local/lib/python3.6/site-packages/chocolate/space.py", line 140, in __init__
assert low < high, "Low must be lower than high"
AssertionError: Low must be lower than high
So the katib experiment runs indefinitely without ever failing and without producing any trials. The controller logs don't help either and no events are generated, it's just the suggestions pod that produces these errors.
Environment:
- Kubeflow version: 1.0
- Minikube version: 1.2.0 (MiniKF latest)
- Kubernetes version: 1.14