-
Notifications
You must be signed in to change notification settings - Fork 487
add docs for new ManuallyPromote upgrade rollout strategy #34573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add docs for new ManuallyPromote upgrade rollout strategy #34573
Conversation
b6c6ad6 to
31c15b2
Compare
|
|
||
|
|
||
|
|
||
| When using `ManuallyPromote`, the new generation can be promoted at any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we phrase this as something like this?
--
ManuallyPromote allows you to choose when to promote the new generation. This means that you can time the promotion for periods when load is low, and minimize the impact of potential downtime for any clients connected to Materialize.
When should I manually promote my environment?
We recommend promoting the new generation only when it has fully hydrated, and caught up to the prior generation. Promoting the new generation prior to hydration can result in clients experiencing downtime. To determine if the new generation has hydrated, consult the UpToDate condition in the status of the Materialize resource. When the new generation has hydrated, the status will be listed as ReadyToPromote.
How can I promote my environment before hydration is complete?
If you would like to promote the new generation before hydration has completed, you can set forcePromote to the same value as requestRollout in the Materialize spec.
How long can the new generation run before being promoted?
Do not leave new generations unpromoted indefinitely. They should either be promoted or canceled. Leaving a new generation open for too long can cause downtime.
New generations increase load on the metadata database because creating one opens a read hold that prevents metadata compaction. This hold is only released when the generation is promoted or canceled. If left open too long, promoting or canceling the generation can trigger a spike in deletion load on the metadata database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... the How long question isn't actually answered other than "not indefinitely."
Also, for things that can cause downtime, using a warning box is good. We could redo the question to incorporate the warning as part of the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the "How can I promote my environment before hydration is complete..." ... I would repeat the sentence about "Promoting the new generation prior to hydration can result in clients experiencing downtime. "
| | `WaitUntilReady` | *Default*. New instances are created and all dataflows are determined to be ready before cutover and terminating the old version, temporarily requiring twice the resources during the transition. | | ||
| | `ImmediatelyPromoteCausingDowntime`| Tears down the prior version before creating and promoting the new version. This causes downtime equal to the duration it takes for dataflows to hydrate, but does not require additional resources. | | ||
| | `ImmediatelyPromoteCausingDowntime` | Tears down the prior version before creating and promoting the new version. This causes downtime equal to the duration it takes for dataflows to hydrate, but does not require additional resources. | | ||
| | `ManuallyPromote` | [BETA ] Creates a new generation of pods, leaving the old generation as the serving generation until the user manually promotes the new generation by updating the `forcePromote` field. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is [Beta] something we use now? or is this what we used to call Public preview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe "Creates a new generation of pods while keeping the previous generation serving traffic until the user manually promotes the new generation using the forcePromote field."
| `ReadyToPromote` the new generation is ready to promote. | ||
|
|
||
| {{<warning>}} | ||
| Do not leave new generations unpromoted indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe (?):
Do not leave new generations unpromoted indefinitely. Unpromoted generations keeps read holds open, preventing compaction until they are promoted or cancelled. If left unpromoted for an extended period, this data can build up and cause extreme deletion load on the metadata backend database when finally promoted or cancelled.
| --type='merge' \ | ||
| -p "{\"spec\": {\"requestRollout\": \"$(uuidgen)\"}}" | ||
| ``` | ||
| ### `requestRollout` with `forcedRollouts` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh .. just noticed, could we add another # to this heading?
|
|
||
|
|
||
|
|
||
| When using `ManuallyPromote`, the new generation can be promoted at any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... the How long question isn't actually answered other than "not indefinitely."
Also, for things that can cause downtime, using a warning box is good. We could redo the question to incorporate the warning as part of the answer.
|
|
||
|
|
||
|
|
||
| When using `ManuallyPromote`, the new generation can be promoted at any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the "How can I promote my environment before hydration is complete..." ... I would repeat the sentence about "Promoting the new generation prior to hydration can result in clients experiencing downtime. "
31c15b2 to
fb6116e
Compare
fb6116e to
9cf3ad4
Compare
|
Small update here, the table was getting weird, so I moved away from that. We also had an extra "Upgrading" header that I removed and rebalanced some headers. I'm not really sure about the questions proposal. My preference, at least for this section, is to focus on describing the system than creating an FAQ, but, casting aside my personal preference your suggestion looks really good and might be more readable. I don't own the docs here, heck I didn't even write the feature. I'm totally on board if you want to make those changes. |
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.