Skip to content

Conversation

@zuston
Copy link
Member

@zuston zuston commented Jan 10, 2024

What changes were proposed in this pull request?

allow specifying negative fallback threshold to avoid event being discarded

Why are the changes needed?

Fix: #1428

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests

@codecov-commenter
Copy link

codecov-commenter commented Jan 10, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (5d027c0) 53.38% compared to head (14bfe11) 54.64%.
Report is 10 commits behind head on master.

Files Patch % Lines
.../storage/HadoopStorageManagerFallbackStrategy.java 0.00% 0 Missing and 1 partial ⚠️
...r/storage/LocalStorageManagerFallbackStrategy.java 0.00% 0 Missing and 1 partial ⚠️
.../storage/RotateStorageManagerFallbackStrategy.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1429      +/-   ##
============================================
+ Coverage     53.38%   54.64%   +1.26%     
+ Complexity     2729     2240     -489     
============================================
  Files           422      346      -76     
  Lines         24046    15606    -8440     
  Branches       2051     1431     -620     
============================================
- Hits          12836     8528    -4308     
+ Misses        10412     6601    -3811     
+ Partials        798      477     -321     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zuston zuston requested review from jerqi and leixm January 12, 2024 02:22
@zuston
Copy link
Member Author

zuston commented Jan 12, 2024

PTAL @leixm @jerqi . I think this is a critical bug in 0.8 branch.

@jerqi
Copy link
Contributor

jerqi commented Jan 12, 2024

I can't get the point of this pr.

@zuston
Copy link
Member Author

zuston commented Jan 12, 2024

I can't get the point of this pr.

Please see the test case

@zuston
Copy link
Member Author

zuston commented Jan 15, 2024

cc @xianjingfeng If you have time, could you help review this ?

.checkValue(
ConfigUtils.NON_NEGATIVE_LONG_VALIDATOR, " fallback times must be non-negative")
.defaultValue(0L)
.defaultValue(-1L)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What if the users set it to 0.
  2. If !storageManager.canWrite(flushEvent), we should ignore retry times. So should we add a new argument named force to org.apache.uniffle.server.storage.AbstractStorageManagerFallbackStrategy#tryFallback for ignoring retry times. WDYT? @zuston @jerqi

https://github.com/apache/incubator-uniffle/blob/318050144b62c0256c151a278bb04ac73c6e544b/server/src/main/java/org/apache/uniffle/server/storage/hybrid/FallbackBasedStorageManagerSelector.java#L49-L51

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me.

public static final ConfigOption<Long> FALLBACK_MAX_FAIL_TIMES =
ConfigOptions.key("rss.server.hybrid.storage.fallback.max.fail.times")
.longType()
.checkValue(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative check is unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove this check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if -1,it means we could fallback directly once failed at the first time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the behaviour of 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first time fallback, it will reject for 0

@zuston zuston changed the title [#1428] fix(server): allow specifying negative fallback threshold to avoid event being discarded [#1428] fix(server): fallback invalid when local storage can't write Jan 16, 2024
@zuston zuston requested a review from xianjingfeng January 16, 2024 10:12
@zuston
Copy link
Member Author

zuston commented Jan 16, 2024

PTAL again @xianjingfeng

Copy link
Member

@xianjingfeng xianjingfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zuston zuston merged commit 3ee1688 into apache:master Jan 17, 2024
@zuston
Copy link
Member Author

zuston commented Jan 17, 2024

Merged. Thanks @xianjingfeng @jerqi

@zuston zuston mentioned this pull request Jan 25, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fallback invalid when local storage's all disks are in high watermark

4 participants