Skip to content

Flink: Fix hash code comparison for requesting global statistics in DataStatisticsCoordinator #13827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 16, 2025

Conversation

Guosmilesmile
Copy link
Contributor

@Guosmilesmile Guosmilesmile commented Aug 15, 2025

In DataStatisticsCoordinator, when handling the RequestGlobalStatisticsEvent, the coordinator should skip responding to the subtask if the event's signature matches the hashCode of the current globalStatistics.

The current implementation is incorrect—this PR fixes that behavior.

@github-actions github-actions bot added the flink label Aug 15, 2025
@Guosmilesmile
Copy link
Contributor Author

Hi @stevenzwu @pvary , could you please help review this PR and verify whether it is appropriate? Thank you very much!

@@ -277,7 +277,7 @@ private void handleRequestGlobalStatisticsEvent(int subtask, RequestGlobalStatis
if (globalStatistics != null) {
runInCoordinatorThread(
() -> {
if (event.signature() != null && event.signature() != globalStatistics.hashCode()) {
if (event.signature() != null && event.signature() == globalStatistics.hashCode()) {
Copy link
Contributor

@stevenzwu stevenzwu Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for catching the bug. I think your fix is correct.

Should we also fix the log message as following?

Skip responding to statistics request from subtask {}, as the operator task already holds the same global statistics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have change it now.

Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor comments

dataStatisticsCoordinator.handleEventFromOperator(
0, 0, new RequestGlobalStatisticsEvent(correctSignature));

Thread.sleep(200);
Copy link
Contributor

@stevenzwu stevenzwu Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are waiting for 200 ms to confirm no response is sent in this case. We can probably replace the sleep with waitForCoordinatorToProcessActions. Then we can immediately assert the sent events count hasn't changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point is out, I have change it .

.atMost(Duration.ofSeconds(10))
.until(() -> receivingTasks.getSentEventsForSubtask(0).size() == 2);

// signature is right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe the comment can be changed to following?

Simulate the scenario where a subtask send global statistics request with the same hash code. The coordinator would skip the response after comparing the request contained hash code with latest global statistics hash code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@stevenzwu stevenzwu changed the title Flink: Fix Signature Judge in DataStatisticsCoordinator Flink: Fix hash code comparison for requesting global statistics in DataStatisticsCoordinator Aug 15, 2025
@stevenzwu stevenzwu merged commit 3685b55 into apache:main Aug 16, 2025
18 checks passed
@stevenzwu
Copy link
Contributor

thanks @Guosmilesmile for catching and fixing the bug

Guosmilesmile added a commit to Guosmilesmile/iceberg that referenced this pull request Aug 16, 2025
pvary pushed a commit that referenced this pull request Aug 16, 2025
@Guosmilesmile Guosmilesmile deleted the shuffle_signature branch August 18, 2025 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants