Skip to content

Conversation

@ejona86
Copy link
Member

@ejona86 ejona86 commented Oct 28, 2025

Since there is no longer a single global XdsClient, it makes more sense to allow things like the c2p name resolver to inject its own bootstrap even if there is one defined in an environment variable. GoogleCloudToProdNameResolver can now pass an XdsClient instance to XdsNameResolver, and SharedXdsClientPoolProvider allows GoogleCloudToProdNameResolver to choose the bootstrap for that one specific target.

Since XdsNameResolver is no longer in control of the XdsClient pool the XdsClient instance is now passed to ClusterImplLb. A channel will now only access the global XdsClient pool exactly once: in the name resolver.

BootstrapInfo is purposefully being shared across channels, as we really want to share things like credentials which can have significant memory use and may have caches which reduce I/O when shared. That is why SharedXdsClientPoolProvider receives BootstrapInfo instead of Map<String,?>.

Verifying BootstrapInfo.server() is not empty was moved from SharedXdsClientPoolProvider to GrpcBootstrapperImpl so avoid getOrCreate() throwing an exception in only that one case. It might make sense to move that to BootstrapperImpl, but that will need more investigation.

A lot of tests needed updating because XdsClientPoolProvider is no longer responsible for parsing the bootstrap, so we now need bootstraps even if XdsClientPoolProvider will ignore it.

This also fixes a bug in GoogleCloudToProdNameResolver where it would initialize the delegate even when it failed to create the bootstrap. That would certainly cause all RPCs on the channel to fail because of the missing bootstrap and it defeated the point of succeeded == false and refresh() which was supposed to retry contacting the metadata server.

The server tests were enhanced to give a useful error when server.start() throws an exception, as otherwise the real error is lost.

b/442819521

CC @shivaspeaks

@ejona86 ejona86 requested a review from kannanjgithub October 28, 2025 20:30
@ejona86 ejona86 added the TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved label Oct 28, 2025
@ejona86 ejona86 force-pushed the xds-inject-bootstrap branch from 1a23135 to b49d940 Compare October 28, 2025 20:40
Since there is no longer a single global XdsClient, it makes more sense
to allow things like the c2p name resolver to inject its own bootstrap
even if there is one defined in an environment variable.
GoogleCloudToProdNameResolver can now pass an XdsClient instance to
XdsNameResolver, and SharedXdsClientPoolProvider allows
GoogleCloudToProdNameResolver to choose the bootstrap for that one
specific target.

Since XdsNameResolver is no longer in control of the XdsClient pool the
XdsClient instance is now passed to ClusterImplLb. A channel will now
only access the global XdsClient pool exactly once: in the name
resolver.

BootstrapInfo is purposefully being shared across channels, as we really
want to share things like credentials which can have significant memory
use and may have caches which reduce I/O when shared. That is why
SharedXdsClientPoolProvider receives BootstrapInfo instead of
Map<String,?>.

Verifying BootstrapInfo.server() is not empty was moved from
SharedXdsClientPoolProvider to GrpcBootstrapperImpl so avoid
getOrCreate() throwing an exception in only that one case. It might make
sense to move that to BootstrapperImpl, but that will need more
investigation.

A lot of tests needed updating because XdsClientPoolProvider is no
longer responsible for parsing the bootstrap, so we now need bootstraps
even if XdsClientPoolProvider will ignore it.

This also fixes a bug in GoogleCloudToProdNameResolver where it would
initialize the delegate even when it failed to create the bootstrap.
That would certainly cause all RPCs on the channel to fail because of
the missing bootstrap and it defeated the point of `succeeded == false`
and `refresh()` which was supposed to retry contacting the metadata
server.

The server tests were enhanced to give a useful error when
server.start() throws an exception, as otherwise the real error is lost.
@ejona86 ejona86 force-pushed the xds-inject-bootstrap branch from b49d940 to 38f92fe Compare October 28, 2025 22:48
@ejona86 ejona86 merged commit 27d1508 into grpc:master Oct 29, 2025
15 of 17 checks passed
@ejona86 ejona86 deleted the xds-inject-bootstrap branch October 29, 2025 16:44
@ejona86 ejona86 removed the TODO:release blocker Issue/PR is important enough to delay the release. Removed after release issues resolved label Oct 29, 2025
AgraVator pushed a commit to AgraVator/grpc-java that referenced this pull request Nov 3, 2025
…#12450)

Since there is no longer a single global XdsClient, it makes more sense
to allow things like the c2p name resolver to inject its own bootstrap
even if there is one defined in an environment variable.
GoogleCloudToProdNameResolver can now pass an XdsClient instance to
XdsNameResolver, and SharedXdsClientPoolProvider allows
GoogleCloudToProdNameResolver to choose the bootstrap for that one
specific target.

Since XdsNameResolver is no longer in control of the XdsClient pool the
XdsClient instance is now passed to ClusterImplLb. A channel will now
only access the global XdsClient pool exactly once: in the name
resolver.

BootstrapInfo is purposefully being shared across channels, as we really
want to share things like credentials which can have significant memory
use and may have caches which reduce I/O when shared. That is why
SharedXdsClientPoolProvider receives BootstrapInfo instead of
Map<String,?>.

Verifying BootstrapInfo.server() is not empty was moved from
SharedXdsClientPoolProvider to GrpcBootstrapperImpl so avoid
getOrCreate() throwing an exception in only that one case. It might make
sense to move that to BootstrapperImpl, but that will need more
investigation.

A lot of tests needed updating because XdsClientPoolProvider is no
longer responsible for parsing the bootstrap, so we now need bootstraps
even if XdsClientPoolProvider will ignore it.

This also fixes a bug in GoogleCloudToProdNameResolver where it would
initialize the delegate even when it failed to create the bootstrap.
That would certainly cause all RPCs on the channel to fail because of
the missing bootstrap and it defeated the point of `succeeded == false`
and `refresh()` which was supposed to retry contacting the metadata
server.

The server tests were enhanced to give a useful error when
server.start() throws an exception, as otherwise the real error is lost.

b/442819521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants