-
Notifications
You must be signed in to change notification settings - Fork 3k
[cmd/supervisor] Fix config merge and opamp server start order #39949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cmd/supervisor] Fix config merge and opamp server start order #39949
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for digging into this. The issue you point out and your fix both seem valid to me, but I'm not understanding why this doesn't happen when I run the Supervisor against the example server, or in any of the other E2E tests. What conditions trigger this error?
@evan-bradley I think what triggers this is the presence of a local configuration that is enough for the Supervisor to start the Collector on the first attempt, inside This bug doesn't happen in the scenario where no local configuration is present because the Supervisor would have to receive some remote configuration first and this path correctly composes and writes the effective configuration to the file before starting the Collector process (see lines 1342 and 1343 below): opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go Lines 1330 to 1343 in c2c11e6
|
Hey @evan-bradley, could you give this another look, please? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @douglascamata. I was able to reproduce this when running the Supervisor locally by restricting the example server from sending any kind of reply that would cause the Supervisor to restart the Collector. Normally an additional message is sent that masks this bug.
Description
This PR fixes a bug that happens because of broken logic in the start up process when there is no explicit OpAMP server in the Supervisor's agent config. Currently it works like this:
opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go
Lines 432 to 435 in a0044fb
The initial merged configuration is written to a file. It uses the opamp server port already saved in the Supervisor.
The "real" opamp server is started using a different random port, leading the Collector to be constantly restarted by the Supervisor because it never connects back via OpAMP.
opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go
Lines 713 to 720 in 929656d
Testing
Added an e2e to cover this.