feat(config): ipv6 compatibiilty for otel collector pods #3949

jagan2221 · 2025-06-20T14:30:53Z

Collector pods were crashing in ipv6 env due to invalid ip address format used for exposing endpoints such as health check, liveliness and otlp/http/grpc server endpoints etc.
We are constructing these endpoints using podIP env variable , need to enclose it with square braces. Added a generic flag to configure this for all endpoints.
Regex used for extracting pod ip and port in pod-annotations scrape config does not support ipv6 parsing.
Introduced a method to construct scrape address pod ip and prometheus port

Both above changes are behind a feature flag - sumologic.ipv6mode

Detailed summary of the changes: https://docs.google.com/document/d/1aBfne-cN6k9p_Lw3zZ2ZqY8Ho0AHYTzQB-nIpYx--GA
These are the issue faced by a customer when setting up helm chart in ipv6 cluster. These fixes are deployed in customer setup using manual overrides using config merge.
Jira: https://sumologic.atlassian.net/browse/OSC-1043

Checklist

Changelog updated or skip changelog label added
Documentation updated
Template tests added for new features
Integration tests added or modified for major features

ipv6 compatibility fixes and UT's for the same

deploy/helm/sumologic/conf/metrics/collector/otelcol/config.yaml

rnishtala-sumo · 2025-06-20T15:01:09Z

We need a changelog for this. The approach makes sense to me. Lets ensure that the integration tests pass. It might also make sense to write an integration test for metrics collection.

jagan2221 · 2025-06-23T08:27:27Z

https://github.com/SumoLogic/sumologic-kubernetes-collection/actions/runs/15817837079/job/44580109359?pr=3949
Something wrong with Helm_Routing_OT test - it was successful in above run for the same PR, but failing in other runs.

I've checked in other PR's too, this test is failing/passing randomly.

rnishtala-sumo

Requesting ITs for this change because of its large footprint.

deploy/helm/sumologic/conf/events/otelcol/config.yaml

rnishtala-sumo

Disabling the setup job to test ipv6 is not preferred. The setup job does more than managing secrets. Instead of implementing this workaround, we should consider adding manual test instructions to our vagrant docs if using dns64 or nat64 is complex.

tests/integration/features.go

tests/integration/helm_ot_default_ipv6_test.go

tests/integration/helm_ot_default_namespaceoverride_test.go

tests/integration/internal/stepfuncs/assess_funcs.go

tests/integration/internal/stepfuncs/kubectl.go

tests/integration/main_test.go

tests/integration/yamls/sumologic-secret.yaml

rnishtala-sumo

Requesting that the IT be removed for now, until we have a solution for the setup job. This can live on a branch. Recommend that our vagrant (developer) docs be updated with instructions on testing this.

jagan2221 · 2025-07-04T13:07:27Z

Even vagrant test steps will be based on disabling setupJob and test helm chart further. Hope this would be fine for local test steps @rnishtala-sumo

jagan2221 · 2025-08-14T05:52:27Z

@rnishtala-sumo @echlebek We have deployed these changes in second customer setup(ChargeCloud) this week in their ipv6 only cluster and the deployment is fine.

From the two customers we have seen, the issue customer asked to solve us is the Sumo pod level issues and not nat64/dns64 Network level issues.

QE started testing as well. We would need this to be merged for QE to test E2E's.

rnishtala-sumo · 2025-08-15T02:30:32Z

Recommend considering documentation for atleast one cluster type before merging this. A walkthrough of the manual steps before the helm chart is deployed is ideal. Could be reviewed by QE as well. An example can be seen in the doc for EKS Fargate. This could be in a different PR.

jagan2221 · 2025-08-20T16:30:55Z

https://docs.google.com/document/d/1ifCHtPsrz9ntTYyigGV0RkOb_9OzVp8j_DmmfqHJ2yM
@rnishtala-sumo Creating a draft doc for the same. Can you review this? Will get it reviewed from QE as well.

rnishtala-sumo · 2025-08-20T16:59:50Z

@jagan2221 what I meant was a public facing doc on github like this one - https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/main/docs/fargate.md (could call it ipv6.md) that runs the customer through the following steps as pre-requisites before installing the helm chart. It could say something like the following and QE could use the same method to test this for consistency.

In AWS, the way to provide IPv6→IPv4 egress is by setting up an NAT64 + DNS64 environment. The recommended approach is to use AWS NAT64 (via NAT gateway + VPC Route 64) with DNS64 resolver. This allows your IPv6-only pods to reach IPv4 destinations transparently.

To set up a NAT Gateway:
aws ec2 create-nat-gateway \
  --subnet-id <public-subnet-id> \
  --allocation-id <eip-allocation-id> \
  --connectivity-type public

Note NAT gateway only translates IPv4 → IPv4, so far.

Enable DNS64 on your VPC’s AmazonProvidedDNS run:
aws ec2 modify-vpc-attribute \
  --vpc-id <vpc-id> \
  --enable-dns64

For subnets where your IPv6-only pods run, add a default route for the well-known IPv6 NAT64 prefix (64:ff9b::/96) to the NAT Gateway:
aws ec2 create-route \
  --route-table-id <rtb-id> \
  --destination-ipv6-cidr-block 64:ff9b::/96 \
  --nat-gateway-id <nat-gateway-id>

Then run a test from a pod:

spec:
  containers:
    - name: busybox
      image: busybox
      command: ["sh", "-c", "ping ipv4-only-endpoint.com"]

We could then add a release note saying IPv6 is supported on EKS clusters.

jagan2221 · 2025-08-20T17:34:07Z

@rnishtala-sumo The draft doc I shared is just the same which would be added as something like ipv6.md .

Also,
We have enough EKS docs on how to configure IPv6 clusters and explaining the default egress ipv6->ipv4 in EKS clusters.

https://docs.aws.amazon.com/eks/latest/userguide/cni-ipv6.html#_ip_address_assignments
https://docs.aws.amazon.com/eks/latest/userguide/deploy-ipv6-cluster.html

Should we point to existing comprehensive AWS docs to customer / should we re-capture things from AWS docs from our side? I think recapturing AWS side things from our side could be a mutating thing and we can't adopt to the changes.
My idea is just to specify the workflow and direct customer to existing detailed vendor specific instead of maintaining those from our end. WDYT?

rnishtala-sumo · 2025-08-20T19:48:07Z

In situations where we're asking a customer to go through manual steps before installing the helm chart, it helps to be specific. We're asking them to do the following

Create a standard AWS NAT Gateway and enable DNS64 translation at the VPC level
Add routing for 64:ff9b::/96 to the NAT Gateway.
Let EKS pods use the default VPC DNS resolver

Asking them to run specific aws cli commands and a test using a pod ensures that the prerequisites are satisfied before the helm chart is deployed. QE can test using the same steps in the E2E.

jagan2221 · 2025-08-22T15:32:49Z

In situations where we're asking a customer to go through manual steps before installing the helm chart, it helps to be specific. We're asking them to do the following

Create a standard AWS NAT Gateway and enable DNS64 translation at the VPC level

Add routing for 64:ff9b::/96 to the NAT Gateway.

Let EKS pods use the default VPC DNS resolver

Asking them to run specific aws cli commands and a test using a pod ensures that the prerequisites are satisfied before the helm chart is deployed. QE can test using the same steps in the E2E.

@rnishtala-sumo
Makes sense.
Created an EKS ipv6 cluster for testing . We don't need a NAT gateway , VPC-CNI plugin does the job. There was no additional things required.

Makes sure VPC and Subnets have both ipv4 and ipv6 CIDR's and subnets needs to "auto-assign IPv6 address for nodes" setting enabled.
Install VPN-CNI plugin which has the ipv6-> ipv4 natting capability inbuilt.

Helm chart in installing properly and running fine with above changes. Will capture the steps to do above steps and share the doc.

jagan2221 · 2025-08-22T22:33:02Z

@rnishtala-sumo
#3977
Please review the doc PR.

jagan2221 · 2025-09-02T17:42:27Z

@rnishtala-sumo Doc PR is throwing markdown lint errors . Checking that, will merge once resolved. Can you please provide a final approval for this PR?

tests/integration/helm_ot_default_ipv6_test.go

rnishtala-sumo

LGTM! lets ensure the new E2E tests run for this feature.

jagan2221 added 2 commits June 20, 2025 19:38

feat: ipv6 compatibilty changes

3f8a1b9

ipv6 compatibility fixes and UT's for the same

Update README.md

78d6d5b

jagan2221 requested a review from a team as a code owner June 20, 2025 14:30

github-actions bot approved these changes Jun 20, 2025

View reviewed changes

jagan2221 added the skip-changelog label Jun 20, 2025

Update values.yaml

9abccbe

rnishtala-sumo reviewed Jun 20, 2025

View reviewed changes

deploy/helm/sumologic/conf/metrics/collector/otelcol/config.yaml Outdated Show resolved Hide resolved

jagan2221 added 11 commits June 22, 2025 08:07

yaml lint

44bd9f3

readme fix

0d6888b

Update .markdownlint.jsonc

cbbd058

Update .markdownlint.jsonc

732aa7b

test fixes

073630a

test fix

dfd5db0

test fix-1

5c144e1

update new regex

e802a46

test fixes

65f951b

test fix

c9c6a75

Create 3949.added.txt

1b583a6

jagan2221 removed the skip-changelog label Jun 23, 2025

jagan2221 added 2 commits June 23, 2025 13:51

Delete .changelog/.changelog/3949.added.txt

8dda2a0

Create 3939.added.txt

625747d

Rename 3939.added.txt to 3949.added.txt

56f3804

jagan2221 requested a review from rnishtala-sumo June 23, 2025 14:49

rnishtala-sumo requested changes Jun 23, 2025

View reviewed changes

chan-tim-sumo reviewed Jun 25, 2025

View reviewed changes

deploy/helm/sumologic/conf/events/otelcol/config.yaml Outdated Show resolved Hide resolved

jagan2221 added 3 commits June 29, 2025 23:09

Update secret.yaml

044bfbb

Update secret.yaml

ce9359c

Update values.yaml

cfeff9c

jagan2221 added 4 commits July 3, 2025 13:58

Update cluster-ipv6.yaml

c3e355d

Update values_helm_default_ot_ipv6.yaml

d4257f9

Update README.md

642643c

gofmt

5ab3d9f

jagan2221 requested review from rnishtala-sumo and chan-tim-sumo July 3, 2025 10:04

Create 3949.changed.txt

ad1afac

rnishtala-sumo requested changes Jul 3, 2025

View reviewed changes

jagan2221 requested a review from rnishtala-sumo July 3, 2025 16:06

rnishtala-sumo requested changes Jul 3, 2025

View reviewed changes

jagan2221 requested a review from rnishtala-sumo September 2, 2025 16:25

Merge branch 'main' into j_ipv6_compatibility_fixes

9ae7f3c

rnishtala-sumo reviewed Sep 3, 2025

View reviewed changes

tests/integration/helm_ot_default_ipv6_test.go Outdated Show resolved Hide resolved

refactor optional/testSpecific metrics exclusion flow

f575ef7

jagan2221 force-pushed the j_ipv6_compatibility_fixes branch from 3c3a530 to f575ef7 Compare September 3, 2025 18:38

jagan2221 added 2 commits September 4, 2025 00:15

Update helm_ot_default_ipv6_test.go

30734f6

Update helm_ot_default_ipv6_test.go

af2b6c6

rnishtala-sumo approved these changes Sep 3, 2025

View reviewed changes

jagan2221 merged commit 9b4db0f into main Sep 3, 2025
133 of 136 checks passed

jagan2221 deleted the j_ipv6_compatibility_fixes branch September 3, 2025 20:08

feat(config): ipv6 compatibiilty for otel collector pods #3949

feat(config): ipv6 compatibiilty for otel collector pods #3949

Uh oh!

Conversation

jagan2221 commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Uh oh!

rnishtala-sumo commented Jun 20, 2025

Uh oh!

jagan2221 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnishtala-sumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rnishtala-sumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnishtala-sumo left a comment

Choose a reason for hiding this comment

Uh oh!

jagan2221 commented Jul 4, 2025

Uh oh!

jagan2221 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnishtala-sumo commented Aug 15, 2025

Uh oh!

jagan2221 commented Aug 20, 2025

Uh oh!

rnishtala-sumo commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jagan2221 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnishtala-sumo commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jagan2221 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jagan2221 commented Aug 22, 2025

Uh oh!

jagan2221 commented Sep 2, 2025

Uh oh!

Uh oh!

rnishtala-sumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jagan2221 commented Jun 20, 2025 •

edited

Loading

jagan2221 commented Jun 23, 2025 •

edited

Loading

jagan2221 commented Aug 14, 2025 •

edited

Loading

rnishtala-sumo commented Aug 20, 2025 •

edited

Loading

jagan2221 commented Aug 20, 2025 •

edited

Loading

rnishtala-sumo commented Aug 20, 2025 •

edited

Loading

jagan2221 commented Aug 22, 2025 •

edited

Loading