Skip to content

test: deflake async-hooks/test-improper-order on AIX #58567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions test/async-hooks/test-improper-order.js
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,12 @@
child.stderr.on('data', (d) => { errData = Buffer.concat([ errData, d ]); });
child.stdout.on('data', (d) => { outData = Buffer.concat([ outData, d ]); });

child.on('close', common.mustCall((code) => {
assert.strictEqual(code, 1);
child.on('close', common.mustCall((code, signal) => {
if (signal) {

Check failure on line 55 in test/async-hooks/test-improper-order.js

View workflow job for this annotation

GitHub Actions / lint-js-and-md

Expected indentation of 4 spaces but found 5
console.log(`Child closed with signal: ${signal}`);
Copy link
Member

@joyeecheung joyeecheung Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the signal can be truthy? And if so, what signal is it? Normally this should not happen unless the child process get killed, but this test does not actively kill the child process - it expects it to exit normally with exit code 1 - so it seems to be something outside the test killing the process, and it's AIX only. If the workaround applies universally, technically one can break this test on all platforms by introducing a crash in the async hooks and making the child process abort (SIGSEGV) and this test would not be able to catch the change of behavior any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyeecheung You're absolutely right about preserving test integrity - we shouldn't mask real crashes.

@gulbaki Building on joyeecheung's feedback, maybe we can make this more targeted? Instead of skipping all signals, we could handle only expected system signals (SIGKILL/SIGTERM) while still catching crashes (SIGSEGV/SIGABRT).

Could you check what specific signal you're seeing on AIX? If it's just system cleanup during test teardown, we can handle that case specifically. But if it's something unexpected, we'd want the test to still catch it.

This way we fix the flakiness without losing the test's ability to detect real async-hooks issues. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right on AIX I’ve actually seen the signal come back as SIGABRT when close fires before exit. If we instead do:

child.once('exit', common.mustCall((code, signal) => {
  assert.strictEqual(code, 1);
  assert.strictEqual(signal, null);
}));

We avoid the race (since exit always has the real code/signal).
A real crash (e.g. SIGSEGV/SIGABRT) will still make signal !== null and fail the test.
Does this look like a safe way to fix the AIX flake while preserving the test’s crash detection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, can you send it as a commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If signal is SIGABRT then I think this would still fail at assert.strictEqual(signal, null), I am not sure where the race is coming from because this is synchronous JS code, if signal is SIGABRT by the time of the first assertion, it can't be set by something else to null before the second assertion. If it's SIGABRT on AIX specifically it may make more sense to just do

if (common.isAIX && signal == 'SIGABRT') {
  // XXX: The child process could be aborted on AIX due to unknown reasons. Work around it.
} else {
  assert. strictEqual(code, 1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @nodejs/platform-aix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some parallel discussion in #58478 (comment), so I have opened #58601

} else {
assert.strictEqual(code, 1);
}
assert.match(outData.toString(), heartbeatMsg,
'did not crash until we reached offending line of code ' +
`(found ${outData})`);
Expand Down
Loading