-
Notifications
You must be signed in to change notification settings - Fork 159
Fixing memory leak in KDD watchers #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@heschlie I think we need to reopen the watchers for correctness too. Doing the snapshot may have put the watchers out of sync; for example, we may be watching for event 1000 but the snapshot just updated us to event 1100 so the event stream that comes out of the watcher will be 1001, 1002, ... If we apply event 1001 on top of snapshot 1100 then we may get inconsistent. I don't think the the KDD snapshot/watcher resolution logic is set up to handle that case. I wonder if we're leaking goroutines. Maybe calling Stop() on the watcher isn't enough to shut it down correctly. (Although its interface clearly states that it should.) |
@neiljerram and I chatted, we think my thoughts above might be out of date since he implemented selective resync. |
@heschlie I think the relevant PRs here are #420 and #437. The situation after #420 was that if KDD thought it was in any kind of bother, it resynced (i.e. relisted) all of the resources and restarted all of the watchers. At that point there was no distinction between conditions which needed resync and conditions that needed a watch restart; and no independent handling for different resources. The #420 change specifically (by @fasaxc) was to add that Then, in #437 (by me), we realized that
So I added independent handling for each resource, and restarting watches without relisting. In that change, I left the Certainly, if we've hit a problem with resource A, we should not need to restart the watch for a different resource B - but clearly that is the effect of the existing The only worry is: we would then have cases where, for a resource A, we relist that resource but do not restart the watch - which may be invalid as @fasaxc says in his comment above. It feels safer to restart the watch whenever we relist (which is in practice what we have been doing until now, because of the overzealous If that is right, some further code changes would be appropriate:
|
Sorry, an update/clarification of the previous comment; I took too much credit... It was PR #433 (by @caseydavenport) that introduced the resyncing of resources individually. (And so the delta of #437 was only for the conditions where we can re-watch without relisting.) The tests in #433 are quite explicit about expecting that 1 resource resync => 7 new watch calls; so I'm a little worried that there may actually be some reason why that is needed. It would be best to check with Casey if he is available. |
@fasaxc @neiljerram Thanks a ton for the detailed feedback! I'll discuss further with @caseydavenport today and see what is/might be needed. I had figured there would me more to this and wanted to get some extra eyes on it. |
@neiljerram thanks for the nice analysis. That @neiljerram's suggestion above sounds like the right thing to do to me. |
Currently running this commit against a cluster missing the |
Memory footprint seems stable after 3.5 hours (staying sub 30MB). |
@fasaxc @neiljerram @caseydavenport I think this is ready for some review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks @heschlie!
@heschlie could you squash the commits? |
When we create a cluster without our CRDs or if we remove one from a running cluster it will start retrying to watch that non-existent resource. During this loop we "resync" and then destroy the old watchers. This process kicks off the leak which stems from somewhere in client-go, which could be from fragmenting memory by quickly creating and destroying the watches and underlying channels. We now only close out watchers that have needed resync, this prevents us from retrying watches on things that don't need to be stopped.
f7b987f
to
9659769
Compare
Send connection failed before sending in-sync
Description
Fixes projectcalico/calico#1057
When we start a cluster either missing our CRDs or we delete some of our CRDs from a running cluster, the syncer goes into a loop attempting to rewatch those resources and failing due to them not existing. Sometime during that process a resync is triggered, which at the end of we destroy the watchers, then immediately rewatch the resources. This creates a lot of churn in the client-go lib that seems to start fragmenting memory which then never gets cleaned up, and our footprint grows by ~2MB/minute. Also, if you add the CRD back, felix never seems to release the memory, I added the CRD back, which stopped the leaking, but it never freed any memory back up even after a few hours.
I removed the code which destroys the watchers at the end of the sync, and ran a cluster overnight with a missing CRD, and did not see any leaking occur. I am reasonably sure this should be ok, but would like some more insight as to why we decided to destroy them at the end of the resync. The comment seems to indicate we might leak some resources when we (re)start watching, but each watch seems to first check to see if we're already watching that resource before trying to start.
I have some pprof dumps from felix I can share as well to show master with CRD, master with CRD deleted, and this fix with CRD deleted.
Todos
Release Note