-
Notifications
You must be signed in to change notification settings - Fork 488
Description
/kind bug
Hi I am trying to setup a standalone deployment of Katib v1beta1 on GKE. It is very possible I am doing something totally wrong here, or missing something obvious and that's why I'm coming to you for help.
TL;DR: I have tried deploying katib via the deploy.sh script and also via Terraform. No matter how I deploy everything seems fine, all of katib-controller, katib-db-manager, katib-mysql and katib-ui pods are up and running, logs look clean. Then I try and submit one of the sample experiments and I get this timeout error:
$ kubectl apply -f examples/v1beta1/grid-example.yaml
Error from server (InternalError): error when creating "examples/v1beta1/grid-example.yaml":
Internal error occurred: failed calling webhook "mutating.experiment.katib.kubeflow.org":
Post https://katib-controller.kubeflow.svc:443/mutate-experiments?timeout=30s: context deadline exceededWhat I've tried:
I tried following all the debugging steps in #1160 (closest to this issue AFAIK) and didn't really get anywhere.
The webhook itself seems to be setup (admittedly I cant say if it's correct or not):
$ kubectl describe MutatingWebhookConfiguration katib-mutating-webhook-config
Name: katib-mutating-webhook-config
Namespace:
Labels: <none>
Annotations: <none>
API Version: admissionregistration.k8s.io/v1beta1
Kind: MutatingWebhookConfiguration
Metadata:
Creation Timestamp: 2020-07-09T00:24:58Z
Generation: 1
Resource Version: 17496
Self Link: /apis/admissionregistration.k8s.io/v1beta1/mutatingwebhookconfigurations/katib-mutating-webhook-config
UID: 9eaca06d-c17a-11ea-ac7b-42010a000066
Webhooks:
Admission Review Versions:
v1beta1
Client Config:
Ca Bundle: <hidden>
Service:
Name: katib-controller
Namespace: kubeflow
Path: /mutate-experiments
Failure Policy: Fail
Name: mutating.experiment.katib.kubeflow.org
Namespace Selector:
Match Expressions:
Key: control-plane
Operator: DoesNotExist
Rules:
API Groups:
kubeflow.org
API Versions:
v1beta1
Operations:
CREATE
UPDATE
Resources:
experiments
Scope: *
Side Effects: Unknown
Timeout Seconds: 30
Admission Review Versions:
v1beta1
Client Config:
Ca Bundle: <hidden>
Service:
Name: katib-controller
Namespace: kubeflow
Path: /mutate-pods
Failure Policy: Ignore
Name: mutating.pod.katib.kubeflow.org
Namespace Selector:
Match Labels:
Katib - Metricscollector - Injection: enabled
Rules:
API Groups:
API Versions:
v1
Operations:
CREATE
Resources:
pods
Scope: *
Side Effects: Unknown
Timeout Seconds: 30
Events: <none>
I've tried multiple Kubernetes versions from: 1.14.10-gke.45 - 1.16.9-gke.6
Some things I have not tried:
- installing via the GCP kubeflow script (the whole point is to get a standalone katib deployment)
- installing v1alpha3 (not against it, but ideally we want the newest version)
Any and all help is appreciated! 😄