Skip to content

Conversation

@flfeurmou-indeed
Copy link
Contributor

Summary

This change enables remote MCP servers (like Datadog and Glean) to restore their OAuth sessions after workload restarts without requiring a new browser-based login.

Problem

When using remote MCP servers with OAuth authentication (e.g., Datadog, Glean), users must re-authenticate via browser every time:

  • The workload is restarted
  • The laptop wakes from sleep
  • The OAuth access token expires

This creates significant friction, especially for developers who frequently close their laptops (e.g., commuting).

Solution

Persist OAuth tokens (access token, refresh token, expiry) to the run configuration, enabling automatic session restoration:

  1. Token caching in config: Added CachedAccessToken, CachedRefreshToken, CachedTokenExpiry fields to remote.Config

  2. PersistingTokenSource wrapper: Intercepts token refreshes and persists updated tokens to disk

  3. Session restoration: On workload start, checks for cached tokens and restores the session without browser interaction

  4. Automatic refresh persistence: When tokens are auto-refreshed by the oauth2 library, the new tokens are saved

Changes

  • pkg/auth/remote/config.go: Added token caching fields and helper methods
  • pkg/auth/remote/handler.go: Restore from cached tokens, persist after OAuth flow
  • pkg/auth/remote/persisting_token_source.go: New wrapper to persist refreshed tokens
  • pkg/auth/discovery/discovery.go: Expose token details in OAuthFlowResult
  • pkg/runner/runner.go: Set up token persister callback

Testing

  • Added unit tests for PersistingTokenSource
  • Added unit tests for Config.HasValidCachedTokens() and Config.ClearCachedTokens()
  • All existing tests pass

Fixes

Fixes #3331

@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Jan 21, 2026
@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 117 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.71%. Comparing base (a5b8c80) to head (2d0bdef).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/auth/remote/handler.go 9.09% 88 Missing and 2 partials ⚠️
pkg/runner/runner.go 0.00% 22 Missing ⚠️
pkg/auth/discovery/discovery.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3382      +/-   ##
==========================================
- Coverage   64.83%   64.71%   -0.13%     
==========================================
  Files         380      381       +1     
  Lines       37004    37121     +117     
==========================================
+ Hits        23992    24022      +30     
- Misses      11127    11217      +90     
+ Partials     1885     1882       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@flfeurmou-indeed flfeurmou-indeed force-pushed the fix/persist-oauth-tokens branch from 15cb24f to 157452f Compare January 21, 2026 19:38
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jan 21, 2026
@flfeurmou-indeed flfeurmou-indeed force-pushed the fix/persist-oauth-tokens branch from 157452f to 76eb7a2 Compare January 21, 2026 19:45
This change enables remote MCP servers (like Datadog and Glean) to
restore their OAuth sessions after workload restarts without requiring
a new browser-based login.

Changes:
- Add CachedAccessToken, CachedRefreshToken, CachedTokenExpiry fields
  to remote.Config for token persistence
- Create PersistingTokenSource wrapper to save tokens when refreshed
- Modify Handler.Authenticate to restore from cached tokens when available
- Add token persister callback in runner to save tokens to config state

Fixes stacklok#3331

Signed-off-by: Frédéric LE FEURMOU <flfeurmou@indeed.com>
@flfeurmou-indeed flfeurmou-indeed force-pushed the fix/persist-oauth-tokens branch from 76eb7a2 to 4fff5af Compare January 21, 2026 19:46
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 21, 2026
Signed-off-by: Frédéric LE FEURMOU <flfeurmou@indeed.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 21, 2026
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 21, 2026
Copy link
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @flfeurmou-indeed 🚀 . The PR looks good overall. I’ve added a few comments.

Address reviewer feedback:
- Store refresh tokens securely in secret manager instead of plain text config
- Remove access token caching (can be regenerated from refresh token)
- Add TokenTypeOAuthRefreshToken for proper secret categorization
- Gracefully handle missing secret manager (tokens won't persist but OAuth works)

Security improvement: tokens are now stored in OS keyring/encrypted storage,
config only contains the secret reference (e.g., OAUTH_REFRESH_TOKEN_workload)

Signed-off-by: Frédéric LE FEURMOU <flfeurmou@indeed.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
Address reviewer nit stacklok#3: DetectAuthenticationFromServer and discoverIssuerAndScopes
were called in both tryRestoreFromCachedTokens and authenticateWithOAuth.

Now these are called once in Authenticate and the results are passed to both
tryRestoreFromCachedTokens and performOAuthFlow, avoiding redundant network calls.

Signed-off-by: Frédéric LE FEURMOU <flfeurmou@indeed.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
@amirejaz
Copy link
Contributor

@flfeurmou Thanks for addressing the feedback. This looks solid now, I’ve added one additional comment. Sorry for the fragmented review.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
amirejaz
amirejaz previously approved these changes Jan 22, 2026
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
Avoids unnecessary writes since refresh tokens are long-lived
and usually don't change on every access-token refresh.
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
@amirejaz
Copy link
Contributor

@flfeurmou-indeed During further testing, I realized we need to either persist the client credentials (i.e., client_id and client_secret) or dynamically re-register the client. Otherwise, token reuse may fail for some mcp servers because they require client credentials to be passed. I tested against one such server and observed the following error:

5:00PM WARN Failed to restore from cached tokens, will perform fresh OAuth flow: cached tokens are invalid or expired: oauth2: "invalid_client" "Client ID is required"

We already have an issue open to track persisting client credentials: #3335.
For this PR, we can re-register the client on each run, and later enhance this to store and reuse the client credentials.

Some MCP servers require client credentials for token refresh.
When no configured client_id exists and a registration endpoint
is available, perform dynamic client registration before attempting
to restore from cached tokens.

This defers stacklok#3335 (persisting client credentials) to a separate PR
while still allowing token restoration to work for servers that
require client credentials.
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
@flfeurmou-indeed
Copy link
Contributor Author

Added getClientCredentials() that does dynamic registration when no client_id is configured. So when restoring from cached tokens, we re-register the client first (if DCR is available), then use those credentials for the token refresh. This keeps this PR simple and defers the full client credentials persistence (#3335) to a separate PR.

@amirejaz
Copy link
Contributor

@flfeurmou-indeed I tested the flow with the latest code and noticed that some servers reject newly generated client credentials because they differ from the original ones. I got the following error:

7:51PM WARN Failed to restore from cached tokens, will perform fresh OAuth flow: cached tokens are invalid or expired: oauth2: "invalid_grant" "Client ID mismatch"

I think we can merge this PR based on the second-to-last commit, without re-registering the client, and prioritize #3335 separately. Please let me know if that works for you, or if you have another suggestion.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jan 22, 2026
@flfeurmou-indeed
Copy link
Contributor Author

@flfeurmou-indeed I tested the flow with the latest code and noticed that some servers reject newly generated client credentials because they differ from the original ones. I got the following error:

7:51PM WARN Failed to restore from cached tokens, will perform fresh OAuth flow: cached tokens are invalid or expired: oauth2: "invalid_grant" "Client ID mismatch"

I think we can merge this PR based on the second-to-last commit, without re-registering the client, and prioritize #3335 separately. Please let me know if that works for you, or if you have another suggestion.

Works for me. Reverted the DCR commit, so this PR is back to just the token persistence with the refresh-token-change optimization. Happy to merge as-is and tackle #3335 as a quick follow-up since this PR lays most of the groundwork. Let me know if there's anything else needed.

@amirejaz
Copy link
Contributor

@flfeurmou-indeed Approved 👍 we can merge once CI is green.

@amirejaz
Copy link
Contributor

@flfeurmou-indeed I was going to assign #3335 to you, but it’s already assigned to someone else.

@amirejaz amirejaz merged commit 9ff9ec2 into stacklok:main Jan 22, 2026
34 of 35 checks passed
@flfeurmou-indeed
Copy link
Contributor Author

@amirejaz sounds good. I'd be happy to take #3335 if carlos-gn hasn't started. Just let me know!

@flfeurmou-indeed
Copy link
Contributor Author

@amirejaz sounds good. I'd be happy to take #3335 if carlos-gn hasn't started. Just let me know!

For context, I already implemented and tested this locally while working on my setup. Handles PKCE flows (empty client_secret) and works with both Datadog and Glean. Ready to open a PR whenever.

@amirejaz
Copy link
Contributor

@flfeurmou-indeed awesome, please file the PR. I will review it tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Oauth flow persistence and usability

2 participants