You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding out of the box support to TrainJob (kubeflow#2560)
* Out-of-the-box support TrainJob
Signed-off-by: Ram Lau <[email protected]>
* Example for Pytorch Distributed
Signed-off-by: Ram Lau <[email protected]>
* Update examples/v1beta1/kubeflow-training-operator/trainjob-pytorch.yaml
Co-authored-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Ram Lau <[email protected]>
* Create folder for Trainer as suggested
Signed-off-by: Ram Lau <[email protected]>
* Movethe exmaple of trainjob to the new folder
Signed-off-by: Ram Lau <[email protected]>
* Ref the primaryContainerName to that of ClusterTrainingRuntime
Signed-off-by: Ram Lau <[email protected]>
* tenzen-y steps down from Katib approver role (kubeflow#2561)
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Ram Lau <[email protected]>
* Set Default value for TrainJob Success, Failure Condition and PrimaryPodLabels in the trial Template
Signed-off-by: Ram Lau <[email protected]>
* Enchance Handling for default value of Success, Fail Cond & Pod Label
Signed-off-by: Ram Lau <[email protected]>
* Bug fix for default value condition
Signed-off-by: Ram Lau <[email protected]>
* code format by hack/update-gofmt.sh
Signed-off-by: Ram Lau <[email protected]>
* add TrainJob trial Resources to cert manager config
Signed-off-by: Ram Lau <[email protected]>
* add trainjob to controller rbac
Signed-off-by: Ram Lau <[email protected]>
* Grant JobSet permission to Katib controller
Signed-off-by: Andrey Velichkevich <[email protected]>
* Remove create/delete RBAC for TrainJob
Signed-off-by: Andrey Velichkevich <[email protected]>
* Fix docker build with libpcre2
Signed-off-by: Andrey Velichkevich <[email protected]>
---------
Signed-off-by: Ram Lau <[email protected]>
Signed-off-by: Yuki Iwai <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Co-authored-by: Andrey Velichkevich <[email protected]>
Co-authored-by: Yuki Iwai <[email protected]>
0 commit comments