Skip to content

Conversation

@swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Nov 4, 2025

What does this PR do?

Fixes an assertion in an integration test.

Why is it important?

This makes the test less flaky. There's been some problems with it, see https://buildkite.com/elastic/elastic-agent/builds/29862#019a4b3e-776e-4c68-a2b6-c67253f68fc2 for example.

@mergify
Copy link
Contributor

mergify bot commented Nov 4, 2025

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@swiatekm swiatekm added skip-changelog flaky-test Unstable or unreliable test cases. backport-8.19 Automated backport to the 8.19 branch labels Nov 4, 2025
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

cc @swiatekm

@swiatekm swiatekm added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 4, 2025
@swiatekm swiatekm marked this pull request as ready for review November 4, 2025 13:41
@swiatekm swiatekm requested a review from a team as a code owner November 4, 2025 13:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm
Copy link
Contributor Author

swiatekm commented Nov 4, 2025

In the interest of going faster, I'm going to manually backport this change to #10996 and #10997.

Comment on lines +143 to +145
if waitErr := cmd.Wait(); waitErr != nil {
assert.ErrorContains(t, waitErr, "signal: interrupt")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand: we are changing the assertion from "it should not exit with an error code" to "it's ok to receive an error as long as it contains signal: interrupt" ?

Why is that? Is a "signal: interrupt" a graceful shutdown like the log above implies ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether the shutdown is graceful or not is checked by looking at logs. The original test had the incorrect assumption that sending a SIGINT to the process will cause Wait() to return without error, but it can in fact return an exec.ExitError containing the process state and signal. I'm not sure why the test passed to begin with, to be honest - maybe there's some kind of race condition involved.

But in any case, this isn't important to what the test is actually checking, so I'd like to fix it before investigating why it's flaky in the first place. I could also just skip the check, too.

Copy link
Member

@pchila pchila Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait() documentation looks pretty clear-cut: assuming that the command is running, we get an error when the exit code of the process is a failing one.
Circling back to the original question: should the process that handles SIGINT gracefully exit with a failure exit code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. Yeah, it should exit with success, so the fact that it doesn't sometimes is evidence that something's not entirely right in there.

@swiatekm swiatekm marked this pull request as draft November 4, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch flaky-test Unstable or unreliable test cases. skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants