Skip to content

Conversation

@SangJunBak
Copy link
Contributor

@SangJunBak SangJunBak commented Dec 16, 2025

Rendered version: https://github.com/MaterializeInc/materialize/blob/52a83bcf8c3186c527dad2cf15876b46ce06fd5d/doc/developer/design/20251215_jwt_authentication.md

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Copy link
Contributor

@jasonhernandez jasonhernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few discussion points. We can agree to leave the catalog as an open discussion / draft.

Comment on lines 68 to 78
### Solution proposal: The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist.

When doing pgwire jwt authentication, we can accept a cleartext password of the form `access=<ACCESS_TOKEN>&refresh=<REFRESH_TOKEN>` where `&` is a delimiter and `refresh=<REFRESH_TOKEN>` is optional. The JWT authenticator will then try to authenticate again and fetch a new access token using the refresh token when close to expiration (using the token API URL in the spec above). If the refresh token doesn’t exist, the session will invalidate. The implementation will be very similar to how we refresh tokens for the Frontegg authenticator. This would require users to have their IDP client generate `refresh` tokens.

By suggesting a short time to live for access tokens, this accomplishes invalidating sessions on deprovisioning of a user. When admins deprovision a user, the next time the user tries to authenticate or refresh their access token, the token API will not allow the user to login but will keep the role in the database.

**Alternative: Use SASL Authentication using the OAUTHBEARER mechanism rather than a cleartext password**

This would be the most Postgres compatible way of doing this and is what it uses for its `oauth` authentication method. However, it may run into compatibility issues with clients. For example in `psql`, there’s no obvious way of sending the bearer token directly without going through libpq's device-grant flow. Furthermore, assuming access tokens are short lived, this could lead to poor UX given there’s no native way to re-authenticate a pgwire session. Finally, our HTTP endpoints wouldn’t be able to support this given they don’t support SASL auth.

OAUTHBEARER reference: [https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER](https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. I would roughly coin this as "bring your own JWT issuer / JWK"

Comment on lines 80 to 92
### Solution proposal: The end user is able to create a token to connect to materialize via psql / postgres clients

Unfortunately, to provide a nice flow to generate the necessary access token and refresh token, we’d need to control the client. Thus we’ll leave the retrieval of the access token/refresh token to the user, similar to CockroachDB.

**Alternative: Revive the mz CLI**

We have an `mz` CLI that’s catered to Cloud and no longer supported. We can potentially bring this back.

**Open question:** Is there anything we can do on our side to easily provide access tokens / refresh tokens to the user without controlling the client? This feels like the missing piece between JWT authentication and something like `aws sso login` in the AWS CLI

### Solution proposal: The end user is able to visit the Materialize console, and sign in with their IdP

A generic Frontend SSO redirect flow would need to be implemented to retrieve an access token and refresh token. However once retrieved, the SQL HTTP / WS API endpoints can use bearer authorization like Cloud and accept the access token. The Console would be in charge of refreshing the access token. The Console work is out of scope for this design document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can exclude this from scope and let customers use existing tooling to get JWTs. I'm not sure this is necessary or particularly useful for customers. We might need something for internal testing, but I would start with just meeting that need for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the design doc to call it out of scope!

Comment on lines +118 to +129
## Phase 2: Map the `admin` claim to a user’s superuser attribute

Based on the `admin` claim, we can set the `superuser` attribute we store in the catalog for password authentication. We do this by doing the following:

- First, in our authenticator, save `admin` inside the user’s `external_metadata`
- Next, in `handle_startup_inner()` we diff them with the user’s current superuser status and if there’s a difference, apply the changes with an `ALTER` operation. We can use `catalog_transact_with_context()` for this.
- On error (e.g. if the ALTER isn’t allowed), we’ll end the connection with a descriptive error message. This is a similar pattern we use for initializing network policies.

This is similar to how we identify superusers in Frontegg auth, except we also treat it as an operation to update the catalog
- We can keep using the session’ metadata as the source of truth to keep parity with Cloud, but eventually we’ll want to use the catalog as the source of truth for all. We can call this **out of scope.**

Prototype: [https://github.com/MaterializeInc/materialize/pull/34372/commits](https://github.com/MaterializeInc/materialize/pull/34372/commits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about cases like this:

User authenticates with JWT as an admin
User has admin permissions revoked in service issuing JWTs
User logs in with password.
How will we know that they had their admin permissions revoked? How will a customer confidently ensure that they're able to revoke admin access?

there are a few solutions:

  1. bind auth methods to user ids (i.e. they can't set / use a password after authenticating with a JWT)
  2. periodic sync (rate limits could be a concern)
  3. live validation to some external endpoint (rate limits could be a concern)
  4. don't support any heterogeneity in auth methods at all (i.e. if JWT is enabled, only accept JWTs, rely on the issuer / expiration time)

In general, I want to avoid any temptation or risk of confusion where we might rely on stale data in the catalog when another source of truth exists.

Copy link
Contributor Author

@SangJunBak SangJunBak Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout! Given we'd probably want to leave SCIM / syncing for later on, I think for now, it might make sense to keep the source of truth in the catalog and have the admin ensure the role exists before a user can authenticate to it via sso. Or we can create the role on first authentication (like in Cloud), but it'd be unprivileged by default and the admin would have to explicitly make the user a SUPERUSER.

Copy link
Contributor Author

@SangJunBak SangJunBak Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jason and I talked it over offline. I think for the current scope, we can move ahead with this approach of "lazy updating" SUPERUSER privileges via the claims. However in the future, we'd really want to allow hosting an auth broker and have it being the source of truth for SUPERUSER privileges and other "control-plane" level permissions. Very similar to how Frontegg is our auth broker in Cloud.

# Implicitly fetch JWKS from
# https://{ domain }/.well-known/openid-configuration and allow override
# if we need to. Required.
issuer: https://dev-123456.okta.com/oauth2/default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all JWT providers support this well known location? Do we also need to support passing a JWK directly? How do we specify an override if they don't support .well-known/openid-configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the big ones do but it's not required in the OIDC spec. I'm thinking we can add two more fields:
jwks and jwksFetchFromIssuer. jwks is optional and accepts a JSON string of this format:

Screenshot 2025-12-18 at 11 06 07 AM

and jwksFetchFromIssuer is by default false but if true, overrides jwks and fetches from https://{ domain }/.well-known/openid-configuration

issuer: https://dev-123456.okta.com/oauth2/default
# Where Materialize will request tokens from the IdP using the refresh token
# if it exists. Optional.
tokenEndpoint: https://dev-123456.okta.com/oauth2/default/v1/token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually standardized? How do we know what kind of request to make? If this is specifically an OIDC thing we should make that clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is standardized! https://www.rfc-editor.org/rfc/rfc6749.html#section-6

It's specifically an OAUTH 2.0 standard. It's not required that the client will include a grant_type of refresh_token, but if that's the case, then they can just leave tokenEndpoint as undefined. Will address this in the design doc.

Comment on lines 30 to 31
authenticatorKind: Jwt
jwtAuthenticationSettings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this stuff is specifically OIDC, we should call it that. I see several things that seem to be assuming OIDC, but then say JWT. There are a million ways to use a JWT that aren't OIDC.


### Solution proposal: Creating a user & adding roles: An admin gives a user access to Materialize

By specifying an audience, we ensure an admin must explicitly give a user in the IDP access to Materialize as long as they use a client exclusively for Materialize. Otherwise when a user first logins, we create a role for them if it does not exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear to me. Can you rephrase it?

Do you mean if they specify an audience, assume the user has permission to login if the audience matches? The audience is specified as required, though, so what is the "otherwise" for?

Shouldn't we also check membership in the appropriate groups? Where are the groups defined? In SQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear to me. Can you rephrase it?

For sure! By audience, I meant the aud field. And I was trying to explain that by setting the audience ID, we ensure a JWT generated for another app (e.g. slack) can't be used with Materialize. This is because we require the audience ID to be set to the client id and each client generally represents an app. I was really trying to emphasize the bold in:" Solution proposal: Creating a user & adding roles: An admin gives a user access to Materialize"

so what is the "otherwise" for?

This is just a grammar typo. My b 🙈 Should just say "When a user first logins..."

Will update the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also check membership in the appropriate groups? Where are the groups defined? In SQL?

Groups don't exist in SQL but ROLEs do! Membership groups are not defined in the IDP and are not validated on the Materialize side. There's some idea of syncing this with the groups claim (similar to Cockroach) or using SCIM to sync this, but rn this is just done in SQL


We have an `mz` CLI that’s catered to Cloud and no longer supported. We can potentially bring this back.

**Open question:** Is there anything we can do on our side to easily provide access tokens / refresh tokens to the user without controlling the client? This feels like the missing piece between JWT authentication and something like `aws sso login` in the AWS CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. It all depends on their SSO provider.

@SangJunBak SangJunBak force-pushed the jun/jwt-auth-design-doc branch from b14b983 to d17a9db Compare December 18, 2025 17:03
@SangJunBak
Copy link
Contributor Author

No explicit ✅ from @alex-hunt-materialize but after responding to his feedback in Github, he said the responses seem good. Have since pushed up commits to reflect his feedback

Screenshot 2025-12-19 at 4 11 16 PM

Comment on lines +32 to +33
# Must match the OIDC client ID. Required.
audience: 060a4f3d-1cac-46e4-b5a5-6b9c66cd9431
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we actually have an issue with Frontegg where we can't / don't validate the audience, in part because we don't know what the expected audience should be. It might be the case where we allow this to be nullable or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that for our frontegg client! Happy to make it nullable to unify things, but I wonder if we should recommend setting it? Otherwise a JWT meant for a different app / client can still be used with Materialize.

```bash
bin/environmentd -- \
--listeners-config-path='/listeners/oidc.json' \
--oidc-authentication-setting="{\"audience\":\"060a4f3d-1cac-46e4-b5a5-6b9c66cd9431\",\"authentication_claim\":\"email\",\"group_claim\":\"groups\",\"issuer\":\"https://dev-123456.okta.com/oauth2/default\",\"jwks\":{\"keys\":[...]},\"jwks_fetch_from_issuer\":true,\"token_endpoint\":\"https://dev-123456.okta.com/oauth2/default/v1/token\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this... In general I think it make sense for the OIDC configuration to come from the spec, and be passed via environmentd flags, but I think we should have an alternative way of updating JWKs that doesn't require a rollout... perhaps we can shove those into a configmap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be configurable at runtime! It does seem a bit confusing / bloated to create another configmap that we need to sync to. I wonder if we can store the OIDC config as system parameters? It'll be tricky given the listener configs/authenticators are setup outside the adapter::coord::serve, but I see an instance of this being done for the connection_limiter https://github.com/SangJunBak/materialize/blob/main/src/environmentd/src/lib.rs#L749-L752

## Success Criteria

- Creating a user & adding roles: An admin gives a user access to Materialize
- The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specifically an SCIM requirement, should SCIM be listed as an initial Non Goal?

Copy link
Contributor Author

@SangJunBak SangJunBak Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking short-lived access tokens + refresh token could potentially solve:

The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist.

to some degree. Though looking ahead, there's an open discussion of whether we do JWT expiration validation to begin with, so maybe not. Can clarify that SCIM is listed as a non goal


## Success Criteria

- Creating a user & adding roles: An admin gives a user access to Materialize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"An admin gives a user access to Materialize" an admin of what?

Also do we want to include things like token refresh, jwk refresh / rotation in success criteria (no is valid) for a first pass.

I think we should also consider some non-goals (SCIM, generically JWT roles mapping to MZ roles).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"An admin gives a user access to Materialize" an admin of what?

Ah I was referring to the "admin" persona described in the PRD: https://www.notion.so/materialize/SSO-for-Self-Managed-2a613f48d37b806f9cb2d06914454d15?source=copy_link#2a613f48d37b802abda0fff1c996e7ff:

Admin: they’re in charge of ensuring that users can only access applications which they are authorized to.

But I can create definitions at the start of the design doc.

Also do we want to include things like token refresh, jwk refresh / rotation in success criteria (no is valid) for a first pass.

I was thinking we include token refresh, but maybe not JWK refresh. Mainly because if we assume short lived access tokens, it'd be a pretty bad user experience to not have a way to refresh the session. I guess another option would be to not check for expiration, similar to what our advisor recommended, but not sure how much of a security risk this would be.

I think we should also consider some non-goals (SCIM, generically JWT roles mapping to MZ roles).

Will write some thoughts down 👍

- Platform-check simple login check (platform-check framework
- JWTs should only be accepted when a valid JWK is set (we do not want to accept JWTs that are not signed with a real, cryptographically sound key)

## Phase 2: Map the `admin` claim to a user’s superuser attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should have something in the OIDC config to map role claims to superuser. I think there's a strong case for this being an anti-pattern, but there's also a reasonable case for doing this since we could think of the same method being used inside of our current frontegg OIDC setup.

Copy link
Contributor Author

@SangJunBak SangJunBak Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this is what Cockroach does, except they treat their superuser equivalent (admin) as an actual role and have all roles synced like this https://www.cockroachlabs.com/docs/v25.4/oidc-authorization#role-synchronization.

It might be easier to explain / implement to skip this phase and have the user's SUPERUSER attribute be added manually via SQL. We can still do auto-creation of a user's role on login authorization, we'd just make them a non-super user.

- Creating a user & adding roles: An admin gives a user access to Materialize
- The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist.
- The end user is able to create a token to connect to materialize via psql / postgres clients
- The end user is able to visit the Materialize console, and sign in with their IdP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mentioned, but I think in order for this to work we must have a .well-known/openid-configuration OIDC discovery doc with an authorization_endpoint.

Do we need to surface a redirect_url somewhere so the customer can configure the redirect url in their IDP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe I should remove this from the success criteria given it's out of scope. But yeah if we were integrating console SSO, we'd need a redirect_url


### Solution proposal: The user should be disabled from logging in when a user is de-provisioned. However, the database level role should still exist.

When doing pgwire Oidc authentication, we can accept a cleartext password of the form `access=<ACCESS_TOKEN>&refresh=<REFRESH_TOKEN>` where `&` is a delimiter and `refresh=<REFRESH_TOKEN>` is optional. The OIDC authenticator will then try to authenticate again and fetch a new access token using the refresh token when close to expiration (using the token API URL in the spec above). If the refresh token doesn’t exist, the session will invalidate. The implementation will be very similar to how we refresh tokens for the Frontegg authenticator. This would require users to have their IDP client generate `refresh` tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when close to expiration

Could you define the frequency a little tighter here? For instance I believe with frontegg we use something like half the expiration period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the refresh token doesn’t exist, the session will invalidate.

I think we should may want the ability to turn this off, but it seems reasonable.

Copy link
Contributor Author

@SangJunBak SangJunBak Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define the frequency a little tighter here? For instance I believe with frontegg we use something like half the expiration period.

I was thinking the same method as our frontegg authenticator, Which is in a task, repeatedly wait for (expiration - now) * 0.8 and see if it's less than a minute (source https://github.com/SangJunBak/materialize/blob/main/src/frontegg-auth/src/auth.rs#L447)

Can document this though

I think we should may want the ability to turn this off, but it seems reasonable.

Happy to introduce a config to keep the session validated. Will add to the design doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same method as our frontegg authenticator, Which is in a task, repeatedly wait for (expiration - now) * 0.8 and see if it's less than a minute (source https://github.com/SangJunBak/materialize/blob/main/src/frontegg-auth/src/auth.rs#L447)

Sounds great!

Comment on lines +86 to +88
This would be the most Postgres compatible way of doing this and is what it uses for its `oauth` authentication method. However, it may run into compatibility issues with clients. For example in `psql`, there’s no obvious way of sending the bearer token directly without going through libpq's device-grant flow. Furthermore, assuming access tokens are short lived, this could lead to poor UX given there’s no native way to re-authenticate a pgwire session. Finally, our HTTP endpoints wouldn’t be able to support this given they don’t support SASL auth.

OAUTHBEARER reference: [https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER](https://www.postgresql.org/docs/18/sasl-authentication.html#SASL-OAUTHBEARER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about whether we want / need to support multiple authentication methods to have broader support for services that could support JWT/refresh token, and services (potentially something like power BI) that it may not work for.

Is it viable to have both SASL and ODIC configured on the same materialize instance (the answer can be a definitive no, that's a security risk).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it viable to have both SASL and ODIC configured on the same materialize instance (the answer can be a definitive no, that's a security risk).

I believe you can have SASL + OAUTH configured on the same materialize instance using the OAUTHBEARER mechanism, however you can't have our cleartext password approach with SASL. A workaround here is opening up a separate port for Sasl. Another approach is adding some hba file similar to postgresql or cockroach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit out of scope for the doc, but Yeah, our listener config is more or less how we do HBA file, in theory one could just provide a listener config as a configmap rather than us generating that... another even more complicated approach would be to define a listener_config struct field in our spec... we would have to think about how these would interact with other options though, and it would be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants