Skip to content

Conversation

dimpavloff
Copy link
Contributor

@dimpavloff dimpavloff commented Aug 22, 2025

Part two for grpc/proposal#492 (A97), following #8431 .

What this PR does is:

  • update internal/xds/bootstrap with support for loading multiple PerRPCCallCredentials specifed in a new call_creds field in the boostrap file as per A97
  • adjust xds/internal/xdsclient/clientimpl.goto use the call credentials when constructing the client
  • update xds/bootstrap to register the jwtcreds call credentials and make them available if GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS is enabled

I have added DialOptionsWithCallCredsForTransport because, even though current and future call credentials are likely to all expect secure transport, I thought it would be safer to check of insecure transport just in case. If you prefer, I can just update DialOptions to use all call credentials regardless of the transport.

Relates to istio/istio#53532

RELEASE NOTES:

  • xds bootstrap: add support for loading a JWT from file and use it as Call Credentials (A97). This is guarded by GRPC_EXPERIMENTAL_XDS_BOOTSTRAP_CALL_CREDS

@easwars easwars added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Sep 26, 2025
@dimpavloff dimpavloff requested a review from easwars September 26, 2025 17:46
@dimpavloff
Copy link
Contributor Author

@easwars I think this is ready for another review :)

@dimpavloff dimpavloff removed their assignment Sep 26, 2025
@easwars easwars self-assigned this Sep 26, 2025
@easwars
Copy link
Contributor

easwars commented Oct 1, 2025

Apologies for the delay in the review. I'm a bit swamped with other things. I will get to this asap.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to make a complete pass. But some comments to get you going. I will try to make another pass asap.

// https://github.com/grpc/proposal/blob/master/A86-xds-http-connect.md
XDSHTTPConnectEnabled = boolFromEnv("GRPC_EXPERIMENTAL_XDS_HTTP_CONNECT", false)

// XDSBootstrapCallCredsEnabled controls if JWT call credentials can be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This environment variable guards the newly added bootstrap field call_creds. I agree that the only call credentials that we currently support via this mechanism is the JWT call creds, but that could change any day (before we get rid of this env var). So, I would prefer we more explicitly call out what this env var guards.


// NewCallCredentials returns a credentials.PerRPCCredentials The input config
// should match the structure specified in gRFC A97 structure.
// See gRFC A97: https://github.com/grpc/proposal/blob/master/A97-xds-jwt-call-creds.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits here.

  • There needs to be a period after the first sentence.
  • I think the next two lines can be shortened. (this is an internal API, so we can omit the link to the actual gRFC)
  • There is no mention about the second return value, func()

So, maybe something like

// NewCallCredentials returns a new JWT token based call credentials. The input
// config must match the structure specified in gRFC A97. The caller is expected 
// to invoke the second return value when they are done using the returned call creds.

"google.golang.org/grpc/credentials"
)

func TestNewCallCredentials(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the Tester defined in the grpctest package here:

type Tester struct{}

It has some setup and teardown magic that we find useful for all of our tests.


if !tt.wantSuccess {
if err == nil {
t.Fatal("Expected error, got nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please identify the function being tested in the test error message. See: https://google.github.io/styleguide/go/decisions#identify-the-function

// CallCreds contains the call credentials configuration for individual RPCs.
// It implements gRFC A97 call credentials structure.
type CallCreds struct {
// Type contains a unique name identifying the call credentials type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm .. The gRFC does not state any uniqueness requirements for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you're right, it doesn't. I was influenced by the ChannelCreds.Type but there it makes more sense given that only one gets picked up. I've updated the comment

}

// CallCreds returns the call credentials configuration for this server.
func (sc *ServerConfig) CallCreds() []CallCreds {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this CallCredsConfig and the next method CallCreds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep symmetry with ChannelCreds and SelectedChannelCreds. However, SelectedCallCreds now returns []credentials.PerRPCCredentials rather than the config, still, which breaks this symmetry.

Realistically, I'm not sure even if the public methods are needed -- the PerRPCCredentials are added to the ServerConfig.extraDialOptions and used via ServerConfig.DialOptions()

WDYT, shall I remove the public methods? I've renamed them in the meantime. I've used CallCredsConfigs, plural, because I also renamed the Go type to CallCredsConfig and the method returns a collection of those. I've also renamed the relevant fields in ServerConfig (and ServerConfigTestingOptions) in similar way.

if envconfig.XDSBootstrapCallCredsEnabled {
// Process call credentials - unlike channel creds, we use ALL supported
// types. Also, call credentials are optional as per gRFC A97.
for _, callCredConfig := range server.CallCreds {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we use a shorter name for the loop variable. Maybe cfg?

// Skip unsupported call credential types (don't fail bootstrap).
continue
}
callCred, cancel, err := c.Build(callCredConfig.Config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We refer to them as credentials and not credential even when there is a single one. This was a major source of confusion to me when I joined the team as well :)

So, s/callCred/callCreds/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes, this did confuse me a bit when starting. Thanks for spotting this

}
}

func TestServerConfigCallCredsIntegration(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have the s receiver on it.

@easwars easwars assigned dimpavloff and unassigned easwars Oct 1, 2025
Copy link
Contributor Author

@dimpavloff dimpavloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @easwars ; I think it's ready for another pass

// CallCreds contains the call credentials configuration for individual RPCs.
// It implements gRFC A97 call credentials structure.
type CallCreds struct {
// Type contains a unique name identifying the call credentials type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you're right, it doesn't. I was influenced by the ChannelCreds.Type but there it makes more sense given that only one gets picked up. I've updated the comment

}

// CallCreds returns the call credentials configuration for this server.
func (sc *ServerConfig) CallCreds() []CallCreds {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep symmetry with ChannelCreds and SelectedChannelCreds. However, SelectedCallCreds now returns []credentials.PerRPCCredentials rather than the config, still, which breaks this symmetry.

Realistically, I'm not sure even if the public methods are needed -- the PerRPCCredentials are added to the ServerConfig.extraDialOptions and used via ServerConfig.DialOptions()

WDYT, shall I remove the public methods? I've renamed them in the meantime. I've used CallCredsConfigs, plural, because I also renamed the Go type to CallCredsConfig and the method returns a collection of those. I've also renamed the relevant fields in ServerConfig (and ServerConfigTestingOptions) in similar way.

// Skip unsupported call credential types (don't fail bootstrap).
continue
}
callCred, cancel, err := c.Build(callCredConfig.Config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes, this did confuse me a bit when starting. Thanks for spotting this

Comment on lines 75 to 80
if err != nil {
t.Fatalf("NewCallCredentials failed: %v", err)
}
if callCreds == nil {
t.Fatal("Expected non-nil bundle")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, much better, thanks!

@dimpavloff dimpavloff requested a review from easwars October 3, 2025 13:39
@dimpavloff dimpavloff removed their assignment Oct 3, 2025
@easwars easwars self-assigned this Oct 8, 2025
Comment on lines +265 to +269
originalJWTEnabled := envconfig.XDSBootstrapCallCredsEnabled
envconfig.XDSBootstrapCallCredsEnabled = true
defer func() {
envconfig.XDSBootstrapCallCredsEnabled = originalJWTEnabled
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper function for this:

func SetEnvConfig[T any](t *testing.T, variable *T, value T) {

Comment on lines +32 to +36
// NewCallCredentials returns a new JWT token based call credentials. The input
// config must match the structure specified in gRFC A97. The caller is expected
// to invoke the second return value when they are done using the returned call creds.
// The cancel function is idempotent.
func NewCallCredentials(configJSON json.RawMessage) (credentials.PerRPCCredentials, func(), error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding named return values to help with the documentation.

// NewCallCredentials creates and returns JWT token-based call credentials
// using the input config (which must match gRFC A97).
//
// The caller must invoke the returned cancel function (which is idempotent),
// when they are done using the returned call credentials.
func NewCallCredentials(configJSON json.RawMessage) (c credentials.PerRPCCredentials, cancel func(), err error) {

"google.golang.org/grpc/internal/grpctest"
)

const defaultCtxTimeout = 5 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/defaultCtxTimeout/defaultTestTimeout/

callCreds, cleanup, err := NewCallCredentials(json.RawMessage(tt.config))

if err == nil {
t.Fatal("NewCallCredentials: expected error, got nil")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding the config to the error message:
t.Fatal("NewCallCredentials(%s) succeeded when expected to fail", tt.config)

t.Fatal("NewCallCredentials: expected error, got nil")
}
if callCreds != nil {
t.Error("NewCallCredentials: Expected nil bundle to be returned")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is no bundle anymore.
t.Error("NewCallCredentials(%s) returned non-nil credentials", tt.config)

Comment on lines +93 to +97
if cleanup == nil {
t.Error("NewCallCredentials: Expected non-nil cleanup function to be returned")
} else {
defer cleanup()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to always return a non-nil cancel func (even in error cases), to avoid such checks at callsites.

}

// Test that call credentials get used
ctx, cancel := context.WithTimeout(context.Background(), defaultCtxTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a 5s timeout here? And this const is only used here, you could remove the global and just inline the value here. Maybe a second at best should be good enough?

func writeTempFile(t *testing.T, content string) string {
t.Helper()
tempDir := t.TempDir()
filePath := filepath.Join(tempDir, "tempfile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't matter. But could be rename the file to something more appropriate. Maybe "jwt_token"?

if cc.Config == nil {
return cc.Type
}
// We do not expect the Marshal call to fail since we wrote to cc.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and in a few other places, please terminate comment lines with periods. https://google.github.io/styleguide/go/decisions#comment-sentences

channelCreds []ChannelCreds
serverFeatures []string
serverURI string
channelCreds []ChannelCreds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, could you please add a TODO to rename ChannelCreds to ChannelCredsConfig to match what we have with the call creds? Thanks.

return cc.Type + "-" + string(b)
}

// CallCredsConfig contains the call credentials configuration for individual RPCs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to clarify in this docstring that these call creds are used when talking to the xDS management server. The docstring on the ChannelCreds type does that. So, what do you think about something like:

CallCredsConfig contains the call credentials configuration to be used on 
RPCs to the management server.

I think the part about A97 is not that important and could be skipped. Thanks.

case !slices.Equal(sc.serverFeatures, other.serverFeatures):
return false
case !sc.selectedCreds.Equal(other.selectedCreds):
case !sc.selectedChannelCreds.Equal(other.selectedChannelCreds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is existing code, but I'm wondering if we can remove this field from the equality check for a couple of reasons:

  • we are already checking the channel credentials configuration (from which we selected this credentials) for equality
  • this type is supposed to semantically represent configuration corresponding to an xDS server and not any runtime state. We are already abusing it a little (or optimizing) by storing the selected credentials in it, so that we don't have to parse the configuration again when we actually want to use the credentials.

}
features := strings.Join(sc.serverFeatures, "-")
return strings.Join([]string{sc.serverURI, sc.selectedCreds.String(), features}, "-")
return strings.Join([]string{sc.serverURI, sc.selectedChannelCreds.String(), features, sc.CallCredsConfigs().String()}, "-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you asking me if we want to include the channel creds here and I said yes. But thinking about this now, I'm wondering if the returned string is going to be too verbose, especially since the String() method of the call credentials config returns a JSON (as does the channel credentials). Have you see what this string actually looks like in any of your testing?

Comment on lines +239 to +244
// CallCreds returns the built call credentials that are ready to use.
// These are the credentials that were successfully built from the call_creds
// configuration.
func (sc *ServerConfig) CallCreds() []credentials.PerRPCCredentials {
return sc.selectedCallCreds
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method at all (and the selectedCallCreds field)? We are currently storing the selected call credentials directly as dial options in the extraDialOptions field.

Comment on lines +594 to +598
original := envconfig.XDSBootstrapCallCredsEnabled
envconfig.XDSBootstrapCallCredsEnabled = true
defer func() {
envconfig.XDSBootstrapCallCredsEnabled = original
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the helper function here as well testutils.SetEnvConfig.

Comment on lines +76 to +79
// Build returns a PerRPCCredentials created from the provided
// configuration, and a function to clean up any additional resources
// associated with them when they are no longer needed.
Build(config json.RawMessage) (credentials.PerRPCCredentials, func(), error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that I added a comment about possibly using named return values for this method on the JWT call creds implementation. But it looks like we haven't done that for the existing channel credentials interface which has a similar set of return values. So, I'm fine with leaving this as is. Thanks.

builder Credentials
typename string
builder ChannelCredentials
minimumRequiredConfig json.RawMessage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this new field when it is always set to nil?

if bundle == nil {
t.Errorf("%T.Build returned nil bundle, expected non-nil", test.builder)
}
stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer this call to stop right after the line where we call Build?

if bundle == nil {
t.Errorf("%T.Build returned nil bundle, expected non-nil", test.builder)
}
stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

}

func TestJwtCallCredentials_DisabledIfFeatureNotEnabled(t *testing.T) {
builder := GetCallCredentials("jwt_call_creds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly set the value of the env var to false instead of running this test under the assumption that it is false (which it currently is). But if we explicitly set it here, then we wont have to change the test when we change the default value of the env var to true.

@easwars easwars assigned dimpavloff and unassigned easwars Oct 10, 2025
@easwars
Copy link
Contributor

easwars commented Oct 10, 2025

@dimpavloff This PR is mostly looking good. Most of the comments in this pass are nits. We should be able to resolve them quickly and move on. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants