Standalone Auth Server Design RFC by tgrunnagle · Pull Request #28 · stacklok/toolhive-rfcs

tgrunnagle · 2026-01-25T17:39:50Z

Overview

This PR adds design documentation for deploying ToolHive's existing pkg/authserver as a standalone Kubernetes service with mTLS authentication.

What's Included

THV-0028-standalone-auth-server-overview.md - RFC overview with problem statement, goals, high-level architecture, security considerations, and implementation phases
THV-0028-standalone-auth-server-design.md - Detailed technical design with CRD specifications, code snippets, and implementation guidance

Problem Being Solved

The authserver (pkg/authserver/) is a complete OAuth2/OIDC implementation but exists as an unintegrated library with no deployment model, no proxyrunner integration, and no operator support.

Proposed Solution

New MCPAuthServer CRD - Deploy authserver as a standalone Kubernetes service
mTLS Authentication - Secure communication between proxyrunners and authserver using cert-manager
Token Exchange Endpoint - Allow proxyrunners to retrieve upstream IDP tokens via /internal/token-exchange
MCPServer CRD Updates - Add authServerClientConfig for mTLS client certificate configuration

Key Design Decisions

Per-MCPServer client certificates for identity and audit
cert-manager integration for automated certificate lifecycle
RFC 9728 OAuth Protected Resource Metadata for discovery
Token caching in proxyrunner to reduce authserver calls

OK, I think this is good in the sense that we control who can access, but don't see how we gate which tokens the server can retrieve?

And a problem we need to figure out is - when the client talks to the authserver, the only identity-like information authserver is in the /authorize request. The MCP spec mandates that the client sends the resource and I suggest we take advantage of it.

Then the new authorization server controller could maintain a mapping (based on MCPGroup or label matching or something) between the resource and the MCPServer (name,namespace) pairs which we'll get from the certificate identity:

{ "https://github.example.com/": { "namespace": "engineering", "name": "github-mcp" }, "https://slack.example.com": { "namespace": "everyone", "name": "slack-mcp" } }

The authserver storage already has a session ID. This binds the JWT to the upstream token (alice's github token vs bob's github token). We'll then extend the storage to include the owner as well (name,namespace) so when authserver stores the token we'll have something like:

sessions[tsid] = { owner = name/namespace tokens = { access, refresh, ..} }

then we issue a JWT with the tsid claim.

On token retrieval, we extract the owner from the client cert and verify:

the owner

the tsid

for vMCP, this means that the vMCP instance is the owner of all the upstream tokens for all its back ends, but I don't know how to separate those..maybe based on claims? Maybe we could have a backend prefix in the claim? OTOH I would like the backends to be quite opaque from the client perspective.

this way we authenticate the clients and authorize access to the tokens on the level of the mcpserver and the user.

jhrozek · 2026-01-26T14:53:28Z

rfcs/THV-0028-standalone-auth-server-design.md

+The authserver stores **upstream IDP tokens** (access tokens, refresh tokens) and links them to the JWTs it issues via the `tsid` (token session ID) claim. When a proxyrunner receives a client request with an authserver-issued JWT, it may need to:
+
+1. **Retrieve the upstream access token** to pass to backend MCP servers that require the original IDP token
+2. **Exchange tokens** (RFC 8693) to get a backend-specific token


very interesting take. at first I was going to push back but 1) standards > NIH 2) we already have the code and 3) the fields in the exchange request seem to be extensible.

My only remaining reservation is that technically we're not exchanging tokens (there's no mechanism minting tokens on request) but retrieving existing pre-stored tokens.

- move `AuthServerConfig` from `MCPServer` to `MCPExternalAuthConfig` - use SPIFFE urls for allowed clients

- Support multiple signing keys - Scope `allowedSubjects` down to MCP server names (optional) - Support `upstreamIdps` in MCPAuthServer (for future multiple Idp support) - Complete `MCPAuthServer` go types code snippet

yrobla · 2026-01-27T08:44:25Z

rfcs/THV-0028-standalone-auth-server-design.md

+  namespace: toolhive-system
+spec:
+  issuer: "https://mcp-authserver.toolhive-system.svc.cluster.local"
+  replicas: 2


aren't we just setting this to 1 all the time? are we looking for scalability at this point? if so, we should consider synchronization, mutexes, etc... around all the spec

yrobla · 2026-01-27T08:47:13Z

rfcs/THV-0028-standalone-auth-server-design.md

+```go
+// pkg/auth/authserver/cache.go
+
+// TokenCache provides thread-safe caching of exchanged tokens


is just in-memory cache or are we looking to use another methods?

Just in memory because 1) single proxy runner instance and 2) getting the token from the AS is pretty cheap

yrobla · 2026-01-27T08:59:55Z

rfcs/THV-0028-standalone-auth-server-design.md

+
+    // UpstreamIdps configures upstream identity providers for user authentication
+    // The authserver federates authentication to these IDPs
+    // Currently only a single IDP is supported; multiple IDPs planned for vMCP use case


now will that be integrated with vmcp code? i do not see details about it...

we'll have to first add support for multiple upstream IDPs to authserver. I think in that model we'll have to accept that vMCP has access to all the upstream tokens from all the back ends to be honest, because it has to be vMCP driving the OAuth flow.

yrobla · 2026-01-27T09:01:05Z

rfcs/THV-0028-standalone-auth-server-design.md

+Following the [MCPServer CRD pattern](cmd/thv-operator/api/v1alpha1/mcpserver_types.go):
+
+```yaml
+apiVersion: toolhive.stacklok.dev/v1alpha1


we miss the observedgeneration, phase, etc... fields here, as we have with other crds?
also, are we thinking in adding validation webhooks?

good point, some sort of status would be good

yrobla · 2026-01-27T09:03:11Z

rfcs/THV-0028-standalone-auth-server-design.md

+    AuthServerConfig *AuthServerClientConfig `json:"authserver_config,omitempty"`
+}
+
+type AuthServerClientConfig struct {


mm runconfig is platform agnostic ... so if we add here authserverconfig, with certificates, isn't that just used for k8s? or how do we configure certs in client?

jhrozek · 2026-01-27T12:40:51Z

rfcs/THV-00XX-standalone-auth-server-design.md

+  port: 8443
+
+  upstreamIdp:
+    type: oidc


let's give the provider a name, too, might be just useful for admin's convenience but we might want to reference the upstream IDP later on by name from elsewhere

jhrozek · 2026-01-27T12:50:28Z

rfcs/THV-0028-standalone-auth-server-design.md

+    # Issuer for server certificate (controller creates the Certificate resource)
+    serverCert:
+      issuerRef:
+        name: toolhive-mtls-ca


Do we require the user to specify those? Can we help them autogenerate the internal CA if needed and let the user run with sane defaults for duration renewal etc?

actually these would be required to chain in an enterprise CA right?

jhrozek · 2026-01-27T13:00:06Z

rfcs/THV-0028-standalone-auth-server-design.md

+
+    // UpstreamIdps configures upstream identity providers for user authentication
+    // The authserver federates authentication to these IDPs
+    // Currently only a single IDP is supported; multiple IDPs planned for vMCP use case


we'll have to first add support for multiple upstream IDPs to authserver. I think in that model we'll have to accept that vMCP has access to all the upstream tokens from all the back ends to be honest, because it has to be vMCP driving the OAuth flow.

jhrozek · 2026-01-27T13:50:59Z

rfcs/THV-0028-standalone-auth-server-design.md

+          - "slack-bot"
+```
+
+**MCPAuthServer CRD Types:**


I think we could drop the types from the RFC to make it more compact

jhrozek · 2026-01-27T13:52:14Z

rfcs/THV-0028-standalone-auth-server-design.md

+        # Trust domain for SPIFFE URIs (required)
+        trustDomain: "toolhive.local"
+        # Allow proxyrunners from specific namespaces (optional)
+        allowedNamespaces:


If we went with the a server-can-only-access-its-own-secrets model, then we wouldn't have to specify the allowedNamespaces and allowedNames - we could shield the AS with NetworkPolicies and accept that a side-effect of having the RBAC rights to create an MCPServer and MCPExternalAuth given you the ability to connect to an authserver.

jhrozek · 2026-01-27T15:01:41Z

rfcs/THV-00XX-standalone-auth-server-design.md

+
+```yaml
+apiVersion: toolhive.stacklok.dev/v1alpha1
+kind: MCPAuthServer


@JAORMX has a good point discussing the proposal that having the token exchange endpoint increases the likelihood of someone exposing it over the internet. I wonder if we could have mitigated that by the operator creating separate services for the internal endpoint and the public endpoints on different ports where one could be exposed over ingress and the other would be clusterIP only?

tgrunnagle · 2026-01-27T16:48:38Z

After discussions w/ @JAORMX and @jhrozek , closing in favor of an in-process proposal.

tgrunnagle added 5 commits January 24, 2026 13:56

Initial design doc

ff25c8f

Update THV-00XX-standalone-auth-server-design.md

c8a868c

Update THV-00XX-standalone-auth-server-design.md

8ffefc0

Update THV-00XX-standalone-auth-server-overview.md

6bc4b82

Rename to 0028

cf433e5

tgrunnagle requested a review from jhrozek January 25, 2026 17:42

jhrozek reviewed Jan 26, 2026

View reviewed changes

tgrunnagle added 2 commits January 26, 2026 09:45

Address some feedback

0821f2d

- move `AuthServerConfig` from `MCPServer` to `MCPExternalAuthConfig` - use SPIFFE urls for allowed clients

Additional feedback

b9f2196

- Support multiple signing keys - Scope `allowedSubjects` down to MCP server names (optional) - Support `upstreamIdps` in MCPAuthServer (for future multiple Idp support) - Complete `MCPAuthServer` go types code snippet

tgrunnagle marked this pull request as ready for review January 26, 2026 20:04

tgrunnagle requested review from JAORMX and yrobla January 26, 2026 20:04

yrobla reviewed Jan 27, 2026

View reviewed changes

jhrozek reviewed Jan 27, 2026

View reviewed changes

tgrunnagle closed this Jan 27, 2026

+                        key: ca.crt
+                    # Allowed client certificate patterns (for access control)
+                    allowedSubjects:

Conversation

tgrunnagle commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What's Included

Problem Being Solved

Proposed Solution

Key Design Decisions

Related

Uh oh!

jhrozek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yrobla Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgrunnagle commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tgrunnagle commented Jan 25, 2026 •

edited

Loading

yrobla Jan 27, 2026 •

edited

Loading