Skip to content

Conversation

@jubrad
Copy link
Contributor

@jubrad jubrad commented Dec 18, 2025

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@jubrad jubrad force-pushed the rollout-strategy-manuallypromote-docs branch from b6c6ad6 to 31c15b2 Compare December 18, 2025 17:04
@jubrad jubrad requested a review from maheshwarip December 18, 2025 17:04



When using `ManuallyPromote`, the new generation can be promoted at any
Copy link
Contributor

@maheshwarip maheshwarip Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we phrase this as something like this?

--
ManuallyPromote allows you to choose when to promote the new generation. This means that you can time the promotion for periods when load is low, and minimize the impact of potential downtime for any clients connected to Materialize.

When should I manually promote my environment?

We recommend promoting the new generation only when it has fully hydrated, and caught up to the prior generation. Promoting the new generation prior to hydration can result in clients experiencing downtime. To determine if the new generation has hydrated, consult the UpToDate condition in the status of the Materialize resource. When the new generation has hydrated, the status will be listed as ReadyToPromote.

How can I promote my environment before hydration is complete?

If you would like to promote the new generation before hydration has completed, you can set forcePromote to the same value as requestRollout in the Materialize spec.

How long can the new generation run before being promoted?

Do not leave new generations unpromoted indefinitely. They should either be promoted or canceled. Leaving a new generation open for too long can cause downtime.

New generations increase load on the metadata database because creating one opens a read hold that prevents metadata compaction. This hold is only released when the generation is promoted or canceled. If left open too long, promoting or canceling the generation can trigger a spike in deletion load on the metadata database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... the How long question isn't actually answered other than "not indefinitely."
Also, for things that can cause downtime, using a warning box is good. We could redo the question to incorporate the warning as part of the answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the "How can I promote my environment before hydration is complete..." ... I would repeat the sentence about "Promoting the new generation prior to hydration can result in clients experiencing downtime. "

@maheshwarip maheshwarip requested a review from kay-kim December 19, 2025 15:40
| `WaitUntilReady` | *Default*. New instances are created and all dataflows are determined to be ready before cutover and terminating the old version, temporarily requiring twice the resources during the transition. |
| `ImmediatelyPromoteCausingDowntime`| Tears down the prior version before creating and promoting the new version. This causes downtime equal to the duration it takes for dataflows to hydrate, but does not require additional resources. |
| `ImmediatelyPromoteCausingDowntime` | Tears down the prior version before creating and promoting the new version. This causes downtime equal to the duration it takes for dataflows to hydrate, but does not require additional resources. |
| `ManuallyPromote` | [BETA ] Creates a new generation of pods, leaving the old generation as the serving generation until the user manually promotes the new generation by updating the `forcePromote` field. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is [Beta] something we use now? or is this what we used to call Public preview?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe "Creates a new generation of pods while keeping the previous generation serving traffic until the user manually promotes the new generation using the forcePromote field."

`ReadyToPromote` the new generation is ready to promote.

{{<warning>}}
Do not leave new generations unpromoted indefinitely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe (?):
Do not leave new generations unpromoted indefinitely. Unpromoted generations keeps read holds open, preventing compaction until they are promoted or cancelled. If left unpromoted for an extended period, this data can build up and cause extreme deletion load on the metadata backend database when finally promoted or cancelled.

--type='merge' \
-p "{\"spec\": {\"requestRollout\": \"$(uuidgen)\"}}"
```
### `requestRollout` with `forcedRollouts`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh .. just noticed, could we add another # to this heading?




When using `ManuallyPromote`, the new generation can be promoted at any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... the How long question isn't actually answered other than "not indefinitely."
Also, for things that can cause downtime, using a warning box is good. We could redo the question to incorporate the warning as part of the answer.




When using `ManuallyPromote`, the new generation can be promoted at any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the "How can I promote my environment before hydration is complete..." ... I would repeat the sentence about "Promoting the new generation prior to hydration can result in clients experiencing downtime. "

@jubrad jubrad force-pushed the rollout-strategy-manuallypromote-docs branch from 31c15b2 to fb6116e Compare December 20, 2025 04:11
@jubrad jubrad force-pushed the rollout-strategy-manuallypromote-docs branch from fb6116e to 9cf3ad4 Compare December 20, 2025 04:15
@jubrad
Copy link
Contributor Author

jubrad commented Dec 20, 2025

@kay-kim @maheshwarip

Small update here, the table was getting weird, so I moved away from that. We also had an extra "Upgrading" header that I removed and rebalanced some headers.

I'm not really sure about the questions proposal. My preference, at least for this section, is to focus on describing the system than creating an FAQ, but, casting aside my personal preference your suggestion looks really good and might be more readable. I don't own the docs here, heck I didn't even write the feature. I'm totally on board if you want to make those changes.

@jubrad jubrad requested a review from maheshwarip December 23, 2025 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants