-
Notifications
You must be signed in to change notification settings - Fork 10.2k
cache: preserve resumable guarantee with empty history watchers #20491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @apullo777. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
cache/cache.go
Outdated
if oldestHistoryRev != 0 { | ||
tooOld = startRev < oldestHistoryRev | ||
} else { | ||
storeFloor := c.store.LatestRev() + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think watch should care about storeFloor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. storeFloor
is unrelated here.
If we want to decide compaction when history is empty, maybe I should move the compaction check into demux.Register
and return (compacted, compactRev)
so the decision and watcher registration can happen under the same lock:
func (d *demux) Register(w *watcher, startingRev int64) (compacted bool, compactRev int64) {
d.mu.Lock()
defer d.mu.Unlock()
oldestRev := d.history.PeekOldest()
if startingRev != 0 && oldestRev != 0 && startingRev < oldestRev {
return true, oldestRev - 1
}
latestRev := d.history.PeekLatest()
if latestRev == 0 {
// watcher registration
...
and in Cache.Watch
we use that result to return canceled response:
op := clientv3.OpWatch(key, opts...)
startRev := op.Rev()
pred, err := c.validateWatch(key, op)
...
w := newWatcher(c.cfg.PerWatcherBufferSize, pred)
compacted, compactRev := c.demux.Register(w, startRev)
if compacted {
ch := make(chan clientv3.WatchResponse, 1)
ch <- clientv3.WatchResponse{
Canceled: true,
CompactRevision: compactRev,
}
close(ch)
return ch
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove all this logic, and just register watch and depend on resync to kick it out. Reading demux state from under lock, can become very stale before the watch is started. Is a premature optimization that adds number of problematic edgecases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the early check in Cache.Watch
makes sense. For now resyncLaggingWatchers
just silently stops watcher on compaction, but I think it should also return cancel response. However, that will require some work to give watcher
a way to carry cancel/termination response so that the goroutine in Cache.Watch
can forward the response before exiting. Maybe I should just revert this part and do the removal in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should also return cancel response
+1
Maybe I should just revert this part and do the removal in a follow-up PR?
Sounds good
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted filessee 32 files with indirect coverage changes @@ Coverage Diff @@
## main #20491 +/- ##
==========================================
- Coverage 69.24% 68.99% -0.25%
==========================================
Files 418 415 -3
Lines 34699 34564 -135
==========================================
- Hits 24028 23849 -179
- Misses 9275 9299 +24
- Partials 1396 1416 +20 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Signed-off-by: Peter Chang <[email protected]>
936d549
to
662e2fc
Compare
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apullo777, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
I open this PR to address the issue #20488
The root issue is that a cache can be ready but still have no history. So we need to ensure proper watcher and gap handling when history is empty. That includes:
startRev
should be registered as lagging.resyncLaggingWatchers
will not promote watcher to active while history is empty.Broadcast
could defensively detect a gap (firstRev
>nextRev
) and move such watchers back to lagging instead of streaming past the gap./cc @serathius @MadhavJivrajani