-
Notifications
You must be signed in to change notification settings - Fork 198
Fix assertion in otel shutdown test #11016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
|
💚 Build Succeeded
cc @swiatekm |
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
| if waitErr := cmd.Wait(); waitErr != nil { | ||
| assert.ErrorContains(t, waitErr, "signal: interrupt") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand: we are changing the assertion from "it should not exit with an error code" to "it's ok to receive an error as long as it contains signal: interrupt" ?
Why is that? Is a "signal: interrupt" a graceful shutdown like the log above implies ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether the shutdown is graceful or not is checked by looking at logs. The original test had the incorrect assumption that sending a SIGINT to the process will cause Wait() to return without error, but it can in fact return an exec.ExitError containing the process state and signal. I'm not sure why the test passed to begin with, to be honest - maybe there's some kind of race condition involved.
But in any case, this isn't important to what the test is actually checking, so I'd like to fix it before investigating why it's flaky in the first place. I could also just skip the check, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait() documentation looks pretty clear-cut: assuming that the command is running, we get an error when the exit code of the process is a failing one.
Circling back to the original question: should the process that handles SIGINT gracefully exit with a failure exit code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. Yeah, it should exit with success, so the fact that it doesn't sometimes is evidence that something's not entirely right in there.
What does this PR do?
Fixes an assertion in an integration test.
Why is it important?
This makes the test less flaky. There's been some problems with it, see https://buildkite.com/elastic/elastic-agent/builds/29862#019a4b3e-776e-4c68-a2b6-c67253f68fc2 for example.