Skip to content

Conversation

thesuperzapper
Copy link
Member

When trying to restart MySQL instances, they get stuck because the old container will refuse to give up the lock on the DB:

  • kubectl rollout restart deployment/katib-mysql -n kubeflow
  • kubectl rollout restart deployment/metadata-db -n kubeflow

This PR fixes that by setting the deployment strategy to Recreate.

Checklist:

  • Unit tests have been rebuilt:
    1. cd manifests/tests
    2. make generate-changed-only
    3. make test

@kubeflow-bot
Copy link
Contributor

This change is Reviewable

@thesuperzapper
Copy link
Member Author

/assign @andreyvelich @zhenghuiwang

can you please review?

@andreyvelich
Copy link
Member

andreyvelich commented Sep 17, 2020

@thesuperzapper Thank you for updating this, have you checked that when katib-mysql pod is restarting katib-db-manager connects to the new DB properly ?

@thesuperzapper
Copy link
Member Author

@andreyvelich well as far as I can see it works

@andreyvelich
Copy link
Member

andreyvelich commented Sep 21, 2020

@thesuperzapper Thanks!
Can you open a PR to make changes in Katib manifests also, please?

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@andreyvelich
Copy link
Member

/cc @gaocegege @johnugeorge

@thesuperzapper
Copy link
Member Author

Gentle ping: @gaocegege @johnugeorge

@thesuperzapper
Copy link
Member Author

Have you had a chance to review this? @andreyvelich @zhenghuiwang

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesuperzapper Sorry for the long reply!
/lgtm
/approve

@thesuperzapper
Copy link
Member Author

I still need one of the metadata owners to LGTM this

Can one of @neuromage @prodonjs @zhenghuiwang please take a look?

@PatrickXYS
Copy link
Member

Can you try re-run new test?

/test ?

@aws-kf-ci-bot
Copy link

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-manifests-presubmit-e2e

Use /test all to run all jobs.

In response to this:

Can you try re-run new test?

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@PatrickXYS: No presubmit jobs available for kubeflow/manifests@master

In response to this:

Can you try re-run new test?

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thesuperzapper
Copy link
Member Author

I am still waiting on one of @neuromage @prodonjs @zhenghuiwang to LGTM this

I want to get this in before Kubeflow 1.2

@PatrickXYS
Copy link
Member

@thesuperzapper I don't think any of them are active in kubeflow community now.

You might need to find review from other one, after that, try re-run test because we have migrated to shared test-infra

@PatrickXYS
Copy link
Member

/test kubeflow-manifests-presubmit-e2e

@k8s-ci-robot
Copy link
Contributor

@PatrickXYS: No presubmit jobs available for kubeflow/manifests@master

In response to this:

/test kubeflow-manifests-presubmit-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jeffwan
Copy link
Member

Jeffwan commented Nov 6, 2020

/lgtm

@thesuperzapper
Copy link
Member Author

@andreyvelich @Jeffwan I have rebased

@andreyvelich
Copy link
Member

/test kubeflow-manifests-presubmit-e2e

@k8s-ci-robot
Copy link
Contributor

@andreyvelich: No presubmit jobs available for kubeflow/manifests@master

In response to this:

/test kubeflow-manifests-presubmit-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andreyvelich
Copy link
Member

andreyvelich commented Nov 9, 2020

@thesuperzapper I think you need to run make generate-changed-only again to fix the unit tests.

@thesuperzapper
Copy link
Member Author

@andreyvelich when I run make generate-changed-only on a rebased version of my branch, I get an error, is master broken? Or is it something to do with my setup?

@PatrickXYS
Copy link
Member

@thesuperzapper If there's something wrong in unit test, you might need to go figure it out, keeping unit test succeed is the bottom line.

@Jeffwan
Copy link
Member

Jeffwan commented Nov 10, 2020

@thesuperzapper Can you share the errors? What's the errors msg? What kustomize version you have in your environment. Recent change all pass UT and I think master should be good.

@thesuperzapper
Copy link
Member Author

/retest

@thesuperzapper
Copy link
Member Author

@Jeffwan it seems like kustomize 3.8.1 is not working with our repo, we probably need to fix this for users

I have rolled my local version back to 3.6.1 and make generate-changed-only works again

@PatrickXYS
Copy link
Member

Copy from manifest README

This repo contains kustomize packages for deploying Kubeflow applications.

If you are a contributor authoring or editing the packages please see Best Practices. Note, please use kustomize v3.2.1 with manifests in this repo, before #538 is fixed which will allow latest kustomize to be used.

https://github.com/kubeflow/manifests

@thesuperzapper
Copy link
Member Author

@PatrickXYS, yeah, but we should aim to support any widely used versions of kustomize

@PatrickXYS
Copy link
Member

@thesuperzapper Ideally we should do so, let's create a GitHub issue to talk about it.

/lgtm
/approve
/hold

Can you cherry-pick this PR into v1.2-branch as well?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, PatrickXYS, thesuperzapper

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jeffwan
Copy link
Member

Jeffwan commented Nov 10, 2020

this should be good to go.

/hold cancel

@thesuperzapper
Copy link
Member Author

@PatrickXYS, I dont think I have the required permissions to chery-pick this commit

@k8s-ci-robot k8s-ci-robot merged commit 6c4f265 into kubeflow:master Nov 10, 2020
@PatrickXYS
Copy link
Member

@thesuperzapper You won't be able to write directly to v1.2-branch, but send a cherry-pick PR

thesuperzapper added a commit to thesuperzapper/manifests that referenced this pull request Nov 16, 2020
k8s-ci-robot pushed a commit that referenced this pull request Nov 16, 2020
* fix mysql deployment strategy (#1562)

(cherry picked from commit 6c4f265)

* resolve tests for azure
@thesuperzapper thesuperzapper deleted the fix_mysql branch November 16, 2023 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants