-
Notifications
You must be signed in to change notification settings - Fork 47
Fix CI and flaky tests #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include: | ||
- otp: 22.3 | ||
elixir: 1.10.4 | ||
- otp: 24.0 | ||
elixir: 1.12.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to define pair to avoid the error below.
Error: Requested Elixir / Erlang/OTP version (1.10.4 / 24.0) not found in version list (did you check Compatibility between Elixir and Erlang/OTP?).Elixir and Erlang/OTP compatibility can be found at: https://hexdocs.pm/elixir/compatibility-and-deprecations.html
end) | ||
|
||
# Wait for processing handle_cast by udp_error. | ||
_ = TelemetryMetricsStatsd.get_pool_id(reporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These randomly failed.
See https://github.com/mopp/telemetry_metrics_statsd/actions/runs/5868819319/job/15912473463?pr=2
TelemetryMetricsStatsd.udp_error/2
uses handle_cast/2
internally.
telemetry_metrics_statsd/lib/telemetry_metrics_statsd.ex
Lines 413 to 415 in f9229e1
def udp_error(reporter, udp, reason) do | |
GenServer.cast(reporter, {:udp_error, udp, reason}) | |
end |
TelemetryMetricsStatsd.get_udp
does NOT use the process.
It just selects randomly one UDP socket from the ETS table.
telemetry_metrics_statsd/lib/telemetry_metrics_statsd.ex
Lines 392 to 403 in f9229e1
def get_udp(pool_id) do | |
# The table can be empty if the UDP error is reported for all the sockets. | |
case :ets.lookup(pool_id, :udp) do | |
[] -> | |
Logger.error("Failed to publish metrics over UDP: no open sockets available.") | |
:error | |
udps -> | |
{:udp, udp} = Enum.random(udps) | |
{:ok, udp} | |
end | |
end |
So, it can return another UDP socket before completing the handle_cast/2
.
Then, the log does not appear and the test fails.
Sometimes it returns the same UDP socket and retries the function which is given to eventually
.
It passed in the case.
I use TelemetryMetricsStatsd.get_pool_id/1
to wait for the processing because it uses handle_call/2
internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. The reason this started flaking is because we introduced the default pool size of 10 after this test was written.
It should all work if we set the pool_size
of the reporter to 1 in this test. Then the UDP socket we get from get_udp/1
will only be different when the reporter reopens a socket, which means it processed the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkgil Hi 👋 Please review and merge 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @mopp! Left one comment regarding the flaky test.
end) | ||
|
||
# Wait for processing handle_cast by udp_error. | ||
_ = TelemetryMetricsStatsd.get_pool_id(reporter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. The reason this started flaking is because we introduced the default pool size of 10 after this test was written.
It should all work if we set the pool_size
of the reporter to 1 in this test. Then the UDP socket we get from get_udp/1
will only be different when the reporter reopens a socket, which means it processed the error.
This reverts commit 89ab955.
Background
The CI fails on the main branch.
What I did
How to test
I confirmed that the CI was passed at the forked repository.
See mopp#2