Skip to content

🐛 Wait for ports creation in ports e2e test #938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

macaptain
Copy link
Contributor

@macaptain macaptain commented Jul 14, 2021

The ports e2e test was checking that a failure condition had not occurred. However, it's possible the failure condition hadn't occurred because the machine used for the test had not been created yet, and the rest of the tests race ahead and do not find the port.

This commit ensures that we wait for the port to be created before testing properties of it.

What this PR does / why we need it:
We need to wait for machine deployment to finish before listing ports in the deployment.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Maybe fixes #927

  • squashed commits

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 14, 2021
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 14, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @macaptain. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 14, 2021
@jichenjc
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 14, 2021
@macaptain macaptain force-pushed the fix-wait-for-ports-test-machine branch 3 times, most recently from 6be9120 to b0df6bb Compare July 14, 2021 12:22
@macaptain macaptain changed the title 🐛 Wait for machine creation in ports e2e test 🐛 Wait for ports creation in ports e2e test Jul 14, 2021
@macaptain macaptain force-pushed the fix-wait-for-ports-test-machine branch 2 times, most recently from b4257e3 to 94620ca Compare July 14, 2021 13:42
@macaptain
Copy link
Contributor Author

/retest

1 similar comment
@macaptain
Copy link
Contributor Author

/retest

@macaptain
Copy link
Contributor Author

I hope this now improves the flakiness because we are waiting for the port to exist before proceeding with the tests.

If it is possible to run the e2e tests once or twice more, can we do this to check the problem is gone? But it's looking hopeful that the flakiness is fixed, so feel free to merge if this PR is up to standard.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2021
@hidekazuna
Copy link
Contributor

I want to see this PR really fixes.

/test pull-cluster-api-provider-openstack-e2e-test

@hidekazuna
Copy link
Contributor

Success twice.
Once more.

/test pull-cluster-api-provider-openstack-e2e-test

@jichenjc
Copy link
Contributor

actually as stated in #927 , there are 2 tests in flaky state now ..

anyway, maybe worthy a few tries (let's say 5 or 10? my previous unsuccessful fix I tried several times and all success, but still not fix this ) ?

@jichenjc
Copy link
Contributor

/test pull-cluster-api-provider-openstack-e2e-test

one more

@macaptain
Copy link
Contributor Author

actually as stated in #927 , there are 2 tests in flaky state now ..

There are probably different flaky failures as well. I don't expect this change to fix the flakiness in a different test. However, the ports test failing seems to be most common based on Testgrid

/test pull-cluster-api-provider-openstack-e2e-test

one more

It looks like this latest test run got stuck very early on in the test run: https://prow.k8s.io/log?container=test&id=1415522658715439104&job=pull-cluster-api-provider-openstack-e2e-test - is this a known issue?

@jichenjc
Copy link
Contributor

It looks like this latest test run got stuck very early on in the test run: https://prow.k8s.io/log?container=test&id=1415522658715439104&job=pull-cluster-api-provider-openstack-e2e-test - is this a known issue?

not I am aware .. at least it doesn't report the flaky test we are targetting to solve
let's give another try, if it works , then let's merge the PR and move on

/test pull-cluster-api-provider-openstack-e2e-test

@macaptain
Copy link
Contributor Author

It looks like this latest test run got stuck very early on in the test run: https://prow.k8s.io/log?container=test&id=1415522658715439104&job=pull-cluster-api-provider-openstack-e2e-test - is this a known issue?

not I am aware .. at least it doesn't report the flaky test we are targetting to solve
let's give another try, if it works , then let's merge the PR and move on

I've raised #940 since it seems the issue is persistent and caused by a new version of sshuttle released a few hours ago.

The ports e2e test was checking that a failure condition had not
occurred. However, it's possible the failure condition hadn't occurred
because the machine used for the test had not been created yet, and the
rest of the tests race ahead and do not find the port.

This commit ensures that we wait for the port to be created before
testing properties of it.
@macaptain macaptain force-pushed the fix-wait-for-ports-test-machine branch from 94620ca to 3e9c05c Compare July 15, 2021 11:06
@macaptain
Copy link
Contributor Author

Rebased after the sshuttle fix, let's hope the tests go green again showing that it's not so flaky anymore.

@hidekazuna
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 15, 2021
@jichenjc
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc, macaptain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5dece8f into kubernetes-sigs:master Jul 16, 2021
@macaptain macaptain deleted the fix-wait-for-ports-test-machine branch July 16, 2021 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gate: high failure ratio because of Workload cluster (without lb) creation
4 participants