Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

This suite now has a couple of thousand tests, some of which take a
couple of seconds, so it times out occasionally. Relaxing the timeout
further.

This suite now has a couple of thousand tests, some of which take a
couple of seconds, so it times out occasionally. Relaxing the timeout
further.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Delivery/Build Build or test infrastructure v8.12.0 labels Oct 10, 2023
@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Oct 10, 2023

@elasticsearchmachine elasticsearchmachine added the Team:Delivery Meta label for Delivery team label Oct 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@mark-vieira
Copy link
Contributor

This is nuts. Here is a "normal" run where this takes less than 15 minutes: https://gradle-enterprise.elastic.co/s/kyht6stktieh6/tests/overview?class=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT

@brianseeders and I have been talking about this and our theory is that these BWC tests are very sensitive to resource contention on the build agent because many (most) of them spin up multi-cluster tests. In this case, it's a 4 node cluster and depending on what else is running on the machine it can cause the test execution time to explode.

Out intended solution there is to strip these tests out and run them more isolated (or not at all in our platform matrix). I'm good with bumping the timeout for now though just to help with these failures.

@DaveCTurner
Copy link
Contributor Author

Yeah these tests alone run 4 nodes (each apparently believing it has 32 CPUs to play with) and Gradle counts this as a single job and runs lots of other multi-node tests at the same time too:

image

It'd be nice to point a Universal Profiler at the test workers at some point to see what's really going on, but I would expect running these things all in parallel is probably not optimal.

@DaveCTurner DaveCTurner merged commit 63b4ee1 into elastic:main Oct 10, 2023
@DaveCTurner DaveCTurner deleted the 2023/10/10/longer-MixedClusterClientYamlTestSuiteIT branch October 10, 2023 21:12
@mark-vieira
Copy link
Contributor

(each apparently believing it has 32 CPUs to play with)

Should we change this behavior by setting node.processors to something more reasonable?

@breskeby
Copy link
Contributor

breskeby commented Dec 1, 2023

@breskeby
Copy link
Contributor

breskeby commented Dec 1, 2023

💚 All backports created successfully

Status Branch Result
8.11

Questions ?

Please refer to the Backport tool documentation

breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Dec 1, 2023
This suite now has a couple of thousand tests, some of which take a
couple of seconds, so it times out occasionally. Relaxing the timeout
further.

(cherry picked from commit 63b4ee1)
elasticsearchmachine pushed a commit that referenced this pull request Dec 1, 2023
)

This suite now has a couple of thousand tests, some of which take a
couple of seconds, so it times out occasionally. Relaxing the timeout
further.

(cherry picked from commit 63b4ee1)

Co-authored-by: David Turner <[email protected]>
@DaveCTurner DaveCTurner restored the 2023/10/10/longer-MixedClusterClientYamlTestSuiteIT branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests v8.11.2 v8.12.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants