-
Notifications
You must be signed in to change notification settings - Fork 180
feat(config): ipv6 compatibiilty for otel collector pods #3949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ipv6 compatibility fixes and UT's for the same
deploy/helm/sumologic/conf/metrics/collector/otelcol/config.yaml
Outdated
Show resolved
Hide resolved
We need a changelog for this. The approach makes sense to me. Lets ensure that the integration tests pass. It might also make sense to write an integration test for metrics collection. |
https://github.com/SumoLogic/sumologic-kubernetes-collection/actions/runs/15817837079/job/44580109359?pr=3949 I've checked in other PR's too, this test is failing/passing randomly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting ITs for this change because of its large footprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling the setup job to test ipv6 is not preferred. The setup job does more than managing secrets. Instead of implementing this workaround, we should consider adding manual test instructions to our vagrant docs if using dns64 or nat64 is complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting that the IT be removed for now, until we have a solution for the setup job. This can live on a branch. Recommend that our vagrant (developer) docs be updated with instructions on testing this.
Even vagrant test steps will be based on disabling setupJob and test helm chart further. Hope this would be fine for local test steps @rnishtala-sumo |
@rnishtala-sumo @echlebek We have deployed these changes in second customer setup(ChargeCloud) this week in their ipv6 only cluster and the deployment is fine. From the two customers we have seen, the issue customer asked to solve us is the Sumo pod level issues and not nat64/dns64 Network level issues. QE started testing as well. We would need this to be merged for QE to test E2E's. |
Recommend considering documentation for atleast one cluster type before merging this. A walkthrough of the manual steps before the helm chart is deployed is ideal. Could be reviewed by QE as well. An example can be seen in the doc for EKS Fargate. This could be in a different PR. |
https://docs.google.com/document/d/1ifCHtPsrz9ntTYyigGV0RkOb_9OzVp8j_DmmfqHJ2yM |
@jagan2221 what I meant was a public facing doc on github like this one - https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/main/docs/fargate.md (could call it ipv6.md) that runs the customer through the following steps as pre-requisites before installing the helm chart. It could say something like the following and QE could use the same method to test this for consistency.
We could then add a release note saying IPv6 is supported on EKS clusters. |
@rnishtala-sumo The draft doc I shared is just the same which would be added as something like ipv6.md . Also, https://docs.aws.amazon.com/eks/latest/userguide/cni-ipv6.html#_ip_address_assignments Should we point to existing comprehensive AWS docs to customer / should we re-capture things from AWS docs from our side? I think recapturing AWS side things from our side could be a mutating thing and we can't adopt to the changes. |
In situations where we're asking a customer to go through manual steps before installing the helm chart, it helps to be specific. We're asking them to do the following
Asking them to run specific aws cli commands and a test using a pod ensures that the prerequisites are satisfied before the helm chart is deployed. QE can test using the same steps in the E2E. |
@rnishtala-sumo
Helm chart in installing properly and running fine with above changes. Will capture the steps to do above steps and share the doc. |
@rnishtala-sumo |
@rnishtala-sumo Doc PR is throwing markdown lint errors . Checking that, will merge once resolved. Can you please provide a final approval for this PR? |
3c3a530
to
f575ef7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! lets ensure the new E2E tests run for this feature.
Collector pods were crashing in ipv6 env due to invalid ip address format used for exposing endpoints such as health check, liveliness and otlp/http/grpc server endpoints etc.
We are constructing these endpoints using podIP env variable , need to enclose it with square braces. Added a generic flag to configure this for all endpoints.
Regex used for extracting pod ip and port in pod-annotations scrape config does not support ipv6 parsing.
Introduced a method to construct scrape address pod ip and prometheus port
Both above changes are behind a feature flag - sumologic.ipv6mode
Detailed summary of the changes: https://docs.google.com/document/d/1aBfne-cN6k9p_Lw3zZ2ZqY8Ho0AHYTzQB-nIpYx--GA
These are the issue faced by a customer when setting up helm chart in ipv6 cluster. These fixes are deployed in customer setup using manual overrides using config merge.
Jira: https://sumologic.atlassian.net/browse/OSC-1043
Checklist