Skip to content

Conversation

mopp
Copy link
Contributor

@mopp mopp commented Aug 15, 2023

Background

The CI fails on the main branch.

What I did

How to test

I confirmed that the CI was passed at the forked repository.
See mopp#2

Comment on lines +39 to +43
include:
- otp: 22.3
elixir: 1.10.4
- otp: 24.0
elixir: 1.12.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to define pair to avoid the error below.

Error: Requested Elixir / Erlang/OTP version (1.10.4 / 24.0) not found in version list (did you check Compatibility between Elixir and Erlang/OTP?).Elixir and Erlang/OTP compatibility can be found at: https://hexdocs.pm/elixir/compatibility-and-deprecations.html

end)

# Wait for processing handle_cast by udp_error.
_ = TelemetryMetricsStatsd.get_pool_id(reporter)
Copy link
Contributor Author

@mopp mopp Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These randomly failed.
See https://github.com/mopp/telemetry_metrics_statsd/actions/runs/5868819319/job/15912473463?pr=2

TelemetryMetricsStatsd.udp_error/2 uses handle_cast/2 internally.

def udp_error(reporter, udp, reason) do
GenServer.cast(reporter, {:udp_error, udp, reason})
end

TelemetryMetricsStatsd.get_udp does NOT use the process.
It just selects randomly one UDP socket from the ETS table.

def get_udp(pool_id) do
# The table can be empty if the UDP error is reported for all the sockets.
case :ets.lookup(pool_id, :udp) do
[] ->
Logger.error("Failed to publish metrics over UDP: no open sockets available.")
:error
udps ->
{:udp, udp} = Enum.random(udps)
{:ok, udp}
end
end

So, it can return another UDP socket before completing the handle_cast/2.
Then, the log does not appear and the test fails.
Sometimes it returns the same UDP socket and retries the function which is given to eventually.
It passed in the case.

I use TelemetryMetricsStatsd.get_pool_id/1 to wait for the processing because it uses handle_call/2 internally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. The reason this started flaking is because we introduced the default pool size of 10 after this test was written.

It should all work if we set the pool_size of the reporter to 1 in this test. Then the UDP socket we get from get_udp/1 will only be different when the reporter reopens a socket, which means it processed the error.

Copy link
Contributor Author

@mopp mopp Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkgil Thanks for your review!
Got it. I missed the pool_size option.

I reverted the commit and set the pool_size to 1.
It passed. (See mopp#4)

Please check the latest commits 🙏

@mopp
Copy link
Contributor Author

mopp commented Aug 19, 2023

@arkgil Hi 👋 Please review and merge 🙏

Copy link
Collaborator

@arkgil arkgil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @mopp! Left one comment regarding the flaky test.

end)

# Wait for processing handle_cast by udp_error.
_ = TelemetryMetricsStatsd.get_pool_id(reporter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. The reason this started flaking is because we introduced the default pool size of 10 after this test was written.

It should all work if we set the pool_size of the reporter to 1 in this test. Then the UDP socket we get from get_udp/1 will only be different when the reporter reopens a socket, which means it processed the error.

@arkgil arkgil mentioned this pull request Aug 23, 2023
@mopp mopp requested a review from arkgil August 24, 2023 03:15
@arkgil arkgil merged commit 2823f4a into beam-telemetry:main Aug 24, 2023
@mopp mopp deleted the fix_ci2 branch August 24, 2023 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants