Skip to content

Standalone Auth Server Design RFC#28

Closed
tgrunnagle wants to merge 7 commits intomainfrom
standalone-auth-server_2026-01-24
Closed

Standalone Auth Server Design RFC#28
tgrunnagle wants to merge 7 commits intomainfrom
standalone-auth-server_2026-01-24

Conversation

@tgrunnagle
Copy link
Contributor

@tgrunnagle tgrunnagle commented Jan 25, 2026

Overview

This PR adds design documentation for deploying ToolHive's existing pkg/authserver as a standalone Kubernetes service with mTLS authentication.

What's Included

Problem Being Solved

The authserver (pkg/authserver/) is a complete OAuth2/OIDC implementation but exists as an unintegrated library with no deployment model, no proxyrunner integration, and no operator support.

Proposed Solution

  1. New MCPAuthServer CRD - Deploy authserver as a standalone Kubernetes service
  2. mTLS Authentication - Secure communication between proxyrunners and authserver using cert-manager
  3. Token Exchange Endpoint - Allow proxyrunners to retrieve upstream IDP tokens via /internal/token-exchange
  4. MCPServer CRD Updates - Add authServerClientConfig for mTLS client certificate configuration

Key Design Decisions

  • Per-MCPServer client certificates for identity and audit
  • cert-manager integration for automated certificate lifecycle
  • RFC 9728 OAuth Protected Resource Metadata for discovery
  • Token caching in proxyrunner to reduce authserver calls

Related

@tgrunnagle tgrunnagle requested a review from jhrozek January 25, 2026 17:42
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going through the doc digesting and commenting, here's my first batch of comments.

key: ca.crt

# Allowed client certificate patterns (for access control)
allowedSubjects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is good in the sense that we control who can access, but don't see how we gate which tokens the server can retrieve?

And a problem we need to figure out is - when the client talks to the authserver, the only identity-like information authserver is in the /authorize request. The MCP spec mandates that the client sends the resource and I suggest we take advantage of it.

Then the new authorization server controller could maintain a mapping (based on MCPGroup or label matching or something) between the resource and the MCPServer (name,namespace) pairs which we'll get from the certificate identity:

  {
    "https://github.example.com/": {
      "namespace": "engineering",
      "name": "github-mcp"
    },
    "https://slack.example.com": {
      "namespace": "everyone",
      "name": "slack-mcp"
    }
  }

The authserver storage already has a session ID. This binds the JWT to the upstream token (alice's github token vs bob's github token). We'll then extend the storage to include the owner as well (name,namespace) so when authserver stores the token we'll have something like:

sessions[tsid] = {
  owner = name/namespace
  tokens = { access, refresh, ..}
}

then we issue a JWT with the tsid claim.

On token retrieval, we extract the owner from the client cert and verify:

  1. the owner
  2. the tsid

for vMCP, this means that the vMCP instance is the owner of all the upstream tokens for all its back ends, but I don't know how to separate those..maybe based on claims? Maybe we could have a backend prefix in the claim? OTOH I would like the backends to be quite opaque from the client perspective.

this way we authenticate the clients and authorize access to the tokens on the level of the mcpserver and the user.

The authserver stores **upstream IDP tokens** (access tokens, refresh tokens) and links them to the JWTs it issues via the `tsid` (token session ID) claim. When a proxyrunner receives a client request with an authserver-issued JWT, it may need to:

1. **Retrieve the upstream access token** to pass to backend MCP servers that require the original IDP token
2. **Exchange tokens** (RFC 8693) to get a backend-specific token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very interesting take. at first I was going to push back but 1) standards > NIH 2) we already have the code and 3) the fields in the exchange request seem to be extensible.

My only remaining reservation is that technically we're not exchanging tokens (there's no mechanism minting tokens on request) but retrieving existing pre-stored tokens.

- move `AuthServerConfig` from `MCPServer` to `MCPExternalAuthConfig`
- use SPIFFE urls for allowed clients
- Support multiple signing keys
- Scope `allowedSubjects` down to MCP server names (optional)
- Support `upstreamIdps` in MCPAuthServer (for future multiple Idp support)
- Complete `MCPAuthServer` go types code snippet
@tgrunnagle tgrunnagle marked this pull request as ready for review January 26, 2026 20:04
@tgrunnagle tgrunnagle requested review from JAORMX and yrobla January 26, 2026 20:04
namespace: toolhive-system
spec:
issuer: "https://mcp-authserver.toolhive-system.svc.cluster.local"
replicas: 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we just setting this to 1 all the time? are we looking for scalability at this point? if so, we should consider synchronization, mutexes, etc... around all the spec

```go
// pkg/auth/authserver/cache.go

// TokenCache provides thread-safe caching of exchanged tokens
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is just in-memory cache or are we looking to use another methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in memory because 1) single proxy runner instance and 2) getting the token from the AS is pretty cheap


// UpstreamIdps configures upstream identity providers for user authentication
// The authserver federates authentication to these IDPs
// Currently only a single IDP is supported; multiple IDPs planned for vMCP use case
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now will that be integrated with vmcp code? i do not see details about it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to first add support for multiple upstream IDPs to authserver. I think in that model we'll have to accept that vMCP has access to all the upstream tokens from all the back ends to be honest, because it has to be vMCP driving the OAuth flow.

Following the [MCPServer CRD pattern](cmd/thv-operator/api/v1alpha1/mcpserver_types.go):

```yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
Copy link

@yrobla yrobla Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we miss the observedgeneration, phase, etc... fields here, as we have with other crds?
also, are we thinking in adding validation webhooks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, some sort of status would be good

AuthServerConfig *AuthServerClientConfig `json:"authserver_config,omitempty"`
}

type AuthServerClientConfig struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm runconfig is platform agnostic ... so if we add here authserverconfig, with certificates, isn't that just used for k8s? or how do we configure certs in client?

port: 8443

upstreamIdp:
type: oidc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's give the provider a name, too, might be just useful for admin's convenience but we might want to reference the upstream IDP later on by name from elsewhere

# Issuer for server certificate (controller creates the Certificate resource)
serverCert:
issuerRef:
name: toolhive-mtls-ca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require the user to specify those? Can we help them autogenerate the internal CA if needed and let the user run with sane defaults for duration renewal etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually these would be required to chain in an enterprise CA right?


// UpstreamIdps configures upstream identity providers for user authentication
// The authserver federates authentication to these IDPs
// Currently only a single IDP is supported; multiple IDPs planned for vMCP use case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to first add support for multiple upstream IDPs to authserver. I think in that model we'll have to accept that vMCP has access to all the upstream tokens from all the back ends to be honest, because it has to be vMCP driving the OAuth flow.

- "slack-bot"
```

**MCPAuthServer CRD Types:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could drop the types from the RFC to make it more compact

# Trust domain for SPIFFE URIs (required)
trustDomain: "toolhive.local"
# Allow proxyrunners from specific namespaces (optional)
allowedNamespaces:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went with the a server-can-only-access-its-own-secrets model, then we wouldn't have to specify the allowedNamespaces and allowedNames - we could shield the AS with NetworkPolicies and accept that a side-effect of having the RBAC rights to create an MCPServer and MCPExternalAuth given you the ability to connect to an authserver.


```yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPAuthServer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JAORMX has a good point discussing the proposal that having the token exchange endpoint increases the likelihood of someone exposing it over the internet. I wonder if we could have mitigated that by the operator creating separate services for the internal endpoint and the public endpoints on different ports where one could be exposed over ingress and the other would be clusterIP only?

@tgrunnagle
Copy link
Contributor Author

After discussions w/ @JAORMX and @jhrozek , closing in favor of an in-process proposal.

@tgrunnagle tgrunnagle closed this Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants