diff --git a/doc/user/content/self-managed-deployments/upgrading/_index.md b/doc/user/content/self-managed-deployments/upgrading/_index.md index 41264ff0da552..fb3511ffbddd5 100644 --- a/doc/user/content/self-managed-deployments/upgrading/_index.md +++ b/doc/user/content/self-managed-deployments/upgrading/_index.md @@ -24,9 +24,7 @@ name="upgrade-major-version-restriction" >}} {{< /note >}} -## Upgrading - -### Upgrade guides +## Upgrade guides The following upgrade guides are available as examples: @@ -39,7 +37,7 @@ name="upgrade-landing-guides-unified" %}} {{% include-from-yaml data="self_managed/upgrades" name="upgrade-landing-guides-legacy" %}} -### Upgrading the Helm Chart and Materialize Operator +## Upgrading the Helm Chart and Materialize Operator {{< important >}} @@ -48,7 +46,7 @@ Operator first. {{}} -#### Update the Helm Chart repository +### Update the Helm Chart repository To update your Materialize Helm Chart repository: @@ -62,7 +60,7 @@ View the available chart versions: helm search repo materialize/materialize-operator --versions ``` -#### Upgrade your Materialize Operator +### Upgrade your Materialize Operator The Materialize Kubernetes Operator is deployed via Helm and can be updated through standard `helm upgrade` command: @@ -93,7 +91,7 @@ helm upgrade -n materialize my-demo materialize/operator \ --version {{< self-managed/versions/get-latest-version >}} ``` -### Upgrading Materialize Instances +## Upgrading Materialize Instances **After** you have upgraded your Materialize Operator, upgrade your Materialize instance(s) to the **APP Version** of the Operator. To find the version of your @@ -121,7 +119,7 @@ periods for your application, the upgrade process involves two steps: - Second, roll out the changes by specifying a new UUID for `requestRollout`. -#### Stage the Materialize instance version change +### Stage the Materialize instance version change To stage the Materialize instances version upgrade, update the `environmentdImageRef` field in the Materialize custom resource spec to the @@ -143,7 +141,7 @@ does not roll out the changes. {{< /note >}} -#### Applying the changes via `requestRollout` +### Applying the changes via `requestRollout` To apply chang Materialize instance upgrade, you must update the `requestRollout` field in the Materialize custom resource spec to a new UUID. Be sure to consult the [Rollout Configurations](#rollout-configuration) to ensure you've selected the correct rollout behavior. @@ -155,7 +153,7 @@ kubectl patch materialize \ -p "{\"spec\": {\"requestRollout\": \"$(uuidgen)\"}}" ``` -#### Staging and applying in a single command +### Staging and applying in a single command Although separating the staging and rollout of the changes into two steps can minimize unexpected downtime and avoid connection drops at critical periods, you @@ -168,7 +166,7 @@ kubectl patch materialize \ -p "{\"spec\": {\"environmentdImageRef\": \"docker.io/materialize/environmentd:{{< self-managed/versions/get-latest-version >}}\", \"requestRollout\": \"$(uuidgen)\"}}" ``` -##### Using YAML Definition +#### Using YAML Definition Alternatively, you can update your Materialize custom resource definition directly: @@ -193,9 +191,9 @@ Apply the updated definition: kubectl apply -f materialize.yaml ``` -### Rollout Configuration +## Rollout Configuration -#### `requestRollout` +### `requestRollout` Specify a new `UUID` value for the `requestRollout` to roll out the changes to the Materialize instance. @@ -215,7 +213,7 @@ kubectl patch materialize \ --type='merge' \ -p "{\"spec\": {\"requestRollout\": \"$(uuidgen)\"}}" ``` -### `requestRollout` with `forcedRollouts` +#### `requestRollout` with `forcedRollouts` Specify a new `UUID` value for `forcedRollout` to roll out even when there are no changes to the instance. Use `forcedRollout` with `requestRollout`. @@ -227,15 +225,36 @@ kubectl patch materialize \ -p "{\"spec\": {\"requestRollout\": \"$(uuidgen)\", \"forceRollout\": \"$(uuidgen)\"}}" ``` -#### Rollout strategies +### Rollout strategies + +Rollout strategies control how Materialize transitions from the current generation to a new generation during an upgrade. + +The behavior of the new version rollout follows your `rolloutStrategy` setting. + +#### *WaitUntilReady* - ***Default*** + +`WaitUntilReady` creates a new generation of pods and automatically cuts over to them as soon as they catch up to the old generation and become `ReadyToPromote`. This strategy temporarily doubles the required resources to run Materialize. + +#### *ImmediatelyPromoteCausingDowntime* +{{< warning >}} Using the `ImmediatelyPromoteCausingDowntime` rollout flag will cause downtime. {{< /warning >}} + +`ImmediatelyPromoteCausingDowntime` tears down the prior generation, and immediately promotes the new generation without waiting for it to hydrate. This causes downtime until the new generation has hydrated. However, it does not require additional resources. + +#### *ManuallyPromote* + +`ManuallyPromote` allows you to choose when to promote the new generation. This means you can time the promotion for periods when load is low, minimizing the impact of potential downtime for any clients connected to Materialize. This strategy temporarily doubles the required resources to run Materialize. + +To minimize downtime, wait until the new generation has fully hydrated and caught up to the prior generation before promoting. To check hydration status, inspect the `UpToDate` condition in the Materialize resource status. When hydration completes, the condition will be `ReadyToPromote`. + +To promote, update the `forcePromote` field to match the `requestRollout` field in the Materialize spec. If you need to promote before hydration completes, you can set `forcePromote` immediately, but clients may experience downtime. + +{{< warning >}} Leaving a new generation unpromoted for over 6 hours may cause downtime. {{< /warning >}} + +**Do not leave new generations unpromoted indefinitely**. They should either be promoted or canceled. New generations open a read hold on the metadata database that prevents compaction. This hold is only released when the generation is promoted or canceled. If left open too long, promoting or canceling can trigger a spike in deletion load on the metadata database, potentially causing downtime. It is not recommended to leave generations unpromoted for over 6 hours. -The behavior of the new version rollout follows your `rolloutStrategy` setting: +#### *inPlaceRollout* - ***Deprecated*** -| `rolloutStrategy` | Description | -| ----------------- | -----------------------------------| -| `WaitUntilReady` | *Default*. New instances are created and all dataflows are determined to be ready before cutover and terminating the old version, temporarily requiring twice the resources during the transition. | -| `ImmediatelyPromoteCausingDowntime`| Tears down the prior version before creating and promoting the new version. This causes downtime equal to the duration it takes for dataflows to hydrate, but does not require additional resources. | -| `inPlaceRollout`| *Deprecated*. The setting is ignored. | +The setting is ignored. ## Verifying the Upgrade diff --git a/src/cloud-resources/src/crd/materialize.rs b/src/cloud-resources/src/crd/materialize.rs index be06c27f396ad..aa9d2bcc1bed4 100644 --- a/src/cloud-resources/src/crd/materialize.rs +++ b/src/cloud-resources/src/crd/materialize.rs @@ -42,9 +42,15 @@ pub mod v1alpha1 { /// Create a new generation of pods, leaving the old generation as the serving generation /// until the user manually promotes the new generation. /// - /// Users can promote the new generation at any time, even if the new generation pods are - /// not fully caught up, by setting `forcePromote` to the same value as `requestRollout` in - /// the Materialize spec. + /// When using `ManuallyPromote`, the new generation can be promoted at any + /// time, even if it has dataflows that are not fully caught up, by setting + /// `forcePromote` to the same value as `requestRollout` in the Materialize spec. + /// + /// To minimize downtime, promotion should occur when the new generation + /// has caught up to the prior generation. To determine if the new + /// generation has caught up, consult the `UpToDate` condition in the + /// status of the Materialize Resource. If the condition's reason is + /// `ReadyToPromote` the new generation is ready to promote. /// /// {{}} /// Do not leave new generations unpromoted indefinitely.