Skip to content

Conversation

@PapaCharlie
Copy link

It may be more convenient to look at these changes sequentially:

* Fix the runtime of TestChildWatch and TestSetWatchers

TestChildWatch was introduced to test what happens when the response is too large for the default
allocated buffer size. It used to create 10k nodes which could take a very significant amount of
time. The same effect can be achieved with far fewer nodes, significantly speeding up the test's
runtime. The test now runs in 5s on my machine instead of sometimes multiple minutes...

TestSetWatchers tested something similar, except that it checks that the outgoing setWatchers packet
is broken up into multiple packets when it's too large. Using a similar trick we can generate
names of specific lengths to test that the behavior is correct. It was also flaky because if your
local ZK deployment is a little slow, deleting all the nodes can take longer than the session
timeout, spuriously failing the test. This has also been fixed, and the test now runs in a little
over 5 seconds as well, instead of failing.

Finally, standardize the ZK server version checking to be a bit more flexible and friendlier towards
future versions of ZooKeeper (note: the original implementation doesn't even work because the env
variable name is incorrect... It `ZK_VERSION`, not `zk_version`)

* Support persistent watches

Implements the new persistent watch types introduced in 3.6, along with some utilities that are
critical when implementing local caches.

* Add metrics receiver and fix persistent watch EventQueue

* Add error to RequestCompleted
…tchers (#2)

Like the comment on Event.Zxid mentions, watch events now include the zxid as of
ZK 3.9.0. Additionally, relaying pings to persistent watchers allows them to
keep track of whether they are behind on consuming updates from ZK, particularly
since the zxid is relayed.
This field can be used to track the end-to-end time in between receiving the
event and actually processing it. Since events end up in a queue, it's hard to
tell how far behind the consumer on the EventQueue is, and this timestamp can be
used to track that.
This data structure has many other applications, especially when dealing with
ZK.
* Upgrade HostProvider

This change fixes the behavior of DNSHostProvider where it does not refresh its cached IP addresses that it resolves once on startup for the configured ZK servers. This new behavior more closely matches the Java client's behavior by randomly selecting an address after resolving the host. It slightly changes the semantics of `HostProvider` with an off-by-one, otherwise the `connect` loop could end up in a situation where it attempts to connect to a stale address. This is fixed by moving the backoff to _before_ getting the address, rather than _after_.

* Bump linter version to support generics

* Fix linter and integration test actions

* Add docs
* Improve logging and error handling

This overhauls the logging and error responses from the *Conn. Namely, it
changes the logging backend to `log/slog` which makes it easier to manage the
library's logging centrally (however the old functionally to provide a custom
logger is retained). It also captures the reason why a specific connection to a
ZK server was closed in the error that is propagated to all callers. This can be
checked with `errors.Is`.

This also adds an exponential backoff feature to the client, which will make it
back off when reconnecting to a ZK server.

* Remove logger option, it messes with the logging

* Change default behavior of DNSHostProvider

* Split implementation of HostProvider, fix up tests
This test was broken by the new error format returned by the connection
When adding a port to an IPv6 address, it must be wrapped with [...].
@codecov
Copy link

codecov bot commented Mar 6, 2024

Codecov Report

❌ Patch coverage is 69.06475% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.00%. Comparing base (c5e730e) to head (d59ad1b).
⚠️ Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
conn.go 64.94% 78 Missing and 17 partials ⚠️
staticdnshostprovider.go 70.00% 8 Missing and 4 partials ⚠️
structs.go 75.00% 8 Missing and 2 partials ⚠️
unlimited_channel.go 82.60% 4 Missing and 4 partials ⚠️
constants.go 0.00% 3 Missing ⚠️
dnshostprovider.go 91.66% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (69.06%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
- Coverage   75.94%   75.00%   -0.95%     
==========================================
  Files           7       10       +3     
  Lines        1193     1452     +259     
==========================================
+ Hits          906     1089     +183     
- Misses        196      252      +56     
- Partials       91      111      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for serverN := 0; serverN < size; serverN++ {
srvPath := filepath.Join(tmpPath, fmt.Sprintf("srv%d", serverN+1))
requireNoError(t, os.Mkdir(srvPath, 0700), "failed to make server path")
requireNoErrorf(t, os.Mkdir(srvPath, 0700), "failed to make server path")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pluck these changes into a separate diff since they do not change any substance to the code. Just trying to remove lines to review as change vs not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack makes sense! As i do that, any opposition to using testify to clean this up further ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have maintained using only the stdlib as it does have nice properties of not conflicting or dealing with any dependencies. I'm not against it longer term, but for this round of changes lets stick to stdlib.

I want to keep the scope of change tighter as we get to the meat of the changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chipping away and doing small PRs like errors.Is will also not change behavior and good to clean up this larger PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK makes sense, it's not necessary, just a nicety. I can move some of the errors.Is stuff but I did change the error signatures of some of these methods, and it does actually break if you don't do errors.Is anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants